Recently, I became intrigued with how collaborative text editing works. I was working on a school assignment in a group, and we were using Google Docs to work. It almost seems like magic, how one person can type something, and a few seconds later have it appear on others’ screens without any conflicts. I also recently discovered that the Keynote and Notes apps have collaborative features built in. So I decided to take a deep dive into collaborative text editing.
The problem with collaboration
To edit a purely text-based document, you actually only do 2 basic operations: insert and delete. Any other operations are built on these 2 operations. A single user editing a document might look something like this.
To collaborate in a document, the naive approach would be to send over each operation a user performs to all other users editing the document. This works when the connection is fast, perhaps when the users are connected in a LAN.
However, when network latency becomes significant, it is incredibly easy to achieve race conditions like this one.
When these race conditions occur, the documents on both clients become out of sync, and because of the nature of these operations, it is rare that they will become in sync again. Due to this, we need a better way of structuring these operations, or even structuring the document itself, so that reliable and consistent collaborative editing becomes a reality.
Enter Operational Transformation, or OT for short. OT is an algorithm that will keep documents consistent across all clients, no matter the order of the operations that get sent from other clients. Wave (the collaborative engine behind Google Docs), uses OT. I didn’t dive into OT too deeply, as I found other, simpler, and more intuitive solutions that I will explain later on. But for now, here is a quick rundown on how basic OT works.
Let’s begin by understanding the very basics of OT. As mentioned earlier, a document is simply some content, which can be operated on by either inserting or deleting. The way OT solves the issue of concurrent editing is by transforming each operation using the previous operation, before it is applied to the document. Using the earlier example, after client A performs the insert on their document, client B’s delete operation, which is the next in line to be performed, gets transformed.
After the transformation, we can see that the operation now deletes the correct character from the document, not the wrong one in the naive example. Also, over on client B’s document, client A’s insert operation gets transformed after client B’s delete operation is performed, and the resulting document is the same on both ends.
Downsides to OT
Unfortunately, most of the popular OT algorithms are not always commutative. This means that when applying OT to two separate documents, it is not always guaranteed that they will end up being the same. Although this is rare, it can happen, and this will result in measures put into place to ensure consistency every once in a while. Thus, a different way of doing collaborative editing has surfaced in the recent years, known as CRDTs.
There are actually 2 flavours of CRDTs. They are (bear with me) Commutative Replicated Data Type and Convergent Replicated Data Type. The following will be about the former. CRDTs take a different approach from OT, in that they rely on structuring the document in a certain way, and not really on structuring the operations. This means that most of the time, using the CRDT approach, the document will be stored in a different format then just a simple string. This includes using things like binary trees and special position identifiers.
Downsides to CRDTs
When using CRDTs instead of OT, we are trading time complexity for space complexity. This means that while OT will take more time to process than CRDTs, CRDTs will usually end up using significantly more memory than OT.
Logoot is a great (and simple) example of a CRDT. It is one of the easier CRDTs to understand, and extensions/variations have also been built upon it to enable extra functionality, like group undo/redo. It is the main CRDT that I will be focusing on (duh, look at the title), and also implementing it, which I will talk about in the next story.
Going back to the original example with the race condition, you can see that the document got out of sync because the operations were using a position within the string. So when an insertion happens, everything after the inserted character is shifted to the right, which means their position is incremented.
When a second operation is applied onto both content, they are inserted at the same position, as specified in the operation. However, they occur at different characters as a result of the first operation. For example, in the next image, the original intention is to insert a
y after the
c. However, due to the operation inserting in a fixed position, the
y is inserted after the
Logoot tries to solve this problem by making the positions move with the text instead. That is, it assigns unique identifiers to each character/group of characters in the document, and performs the insertions and deletions based on the position of these identifiers.
In Logoot, the document is split up into groups of text. A group of text is called an
atom. Each atom is given a unique identifier, or a
uid, which has to be generated in a special way that will be covered later. Each
uid is comparable to other
uids. This means that you can do things like ask if
uid2 and vice versa. You can also check if they are equal. The document is then stored as an array of
(uid, atom) pairs, sorted by the
uids within the pairs in ascending order. There will also be two special, blank lines that exist in every document. One at the very start which will have the
MIN_uid, and at the very bottom, which will have the
Phew, that was a lot to take in. Here comes the most important part though. For any two different
uids, you should always be able to find a
uid in between them. This is the property that makes Logoot work. So, the
uids in Logoot need to be represented in a way that adheres to the property.
Let’s start simple. This whole time, we represented the identifiers of each character with its position in the initial string — an integer. However, using integers as
uids does not adhere to the property we mentioned. Let’s say we have the
uids 4 and 5. It is impossible to find a integer in between 4 and 5, and thus integers cannot be used as
At this point, you might be thinking, what about floats? Now, theoretically, using real numbers as
uids would work, as there are an infinite number of real numbers between any two different real numbers. However, in the context of a computer, this isn’t possible. Floating point numbers are simply approximations of real numbers, which means that it’s not always possible to find a floating point number between any two floating point numbers. Simply put, you cannot represent an infinite number of floating point numbers in just 32 or 64-bits.
So how are these
uids represented? The answer is lists. Using lists, we can easily adhere to the property. A
uid is represented by a list of integers. Let’s say we have the
 (Take note that these are lists of a single integer element, and not just the integers). In order to find a
uid in between
, since there is no integer in between
, we can simply make a longer list, that starts with a
[4, 8]). Now, we have the
[4, 8] and
. And this can repeat forever, since lists are infinitely extensible (Of course, the maximum size of a list is limited by the available RAM, but this is big enough that we don’t have to worry.
Therefore, in Logoot,
uids are represented by lists of integers. Take note that there are some additional components that make up a
uid, but for the sake of understanding Logoot they aren’t necessary.
Now that we have covered
uids in Logoot, we can move on the constructing a simple Logoot document.
A sample document
In Logoot, a document is made up of an array of
pair containing a
uid, and an
atom is simply a string, containing some of the content in the document. The
pairs in the document are always sorted by their
uids. To get the content of the entire document, simply concatenate all the
atoms in order. As with all data structures (remember Logoot is a CRDT), some operations can be performed on it.
insert operation can be performed on a Logoot document. To perform an
insert, you simply need to have the
pair to insert. To generate the
uid for the
pair, you simply have to take two existing
uids and generate a random
uid in between them.
delete operation can also be performed. To perform a
deletion, you only need to know the
uid to delete.
Start and end lines
insert operation, you can see that to generate a new
uid, you need 2 existing
uids. However, in a blank document, how would the two existing
uids be found? This is where the start and end lines come into play. The start and end lines are two lines that always exist in a Logoot document. They contain an empty
atom and their
MAX respectively, where
MAX is the maximum size of an integer.
Hopefully this article has given you a good understanding of Logoot, and even just collaborative editing in general. I implemented Logoot in the Go programming language. Check it out here. Be sure to check out my other stories and thanks for reading!