Fast, accurate swipe typing system. Use it today in FUTO Keyboard, our fully-offline Android keyboard app. Or download the weights and build with it.

No status

TODO: Either make it use ContextLM + Decoder or make it clearer it's encoder-only

TODO: avoid loading the large files initially and instead display offline animation

For a long time, good mobile swipe typing was locked behind closed-source keyboard apps or unlicensed private libraries.

FUTO Swipe is our family of open models and algorithms that aims to solve this problem. We developed this primarily for FUTO Keyboard, but we also welcome the broader community to make use of the FUTO Swipe models. As this has been a long-term investment for us, we ask that an attribution is made visible to end-users. Read license

Dataset

In August 2024, we launched a dataset collection effort on the swipe.futo.org domain. Users would voluntarily visit the webpage on their mobile phone and be given instructions and information about the dataset. After consenting, they would be given sentences, primarily from Wikipedia, and would be asked to swipe them word-by-word.

In the end, this produced over 1 million swipes. Approximately x% were filtered out for quality concerns. In December 2024, we released a dataset of 1 million swipes under the MIT license, and it is available today as a HuggingFace dataset.

We made heavy use of this data to train our models and to evaluate different swipe typing systems.

Models

Our architecture includes three model types.

The Encoder model is a universal layout-agnostic and language-agnostic, and is used for making swipe typing predictions in the general case. However, it does not offer cutting-edge accuracy.

The ContextLM model is a very small language model that is trained for a single language. It's used to improve the quality of predictions by eliminating nonsensical words given the preceding words in the sentence. It only requires text data for training.

Finally, the decoder is a language-specific and layout-specific model that learns peculiarities and achieves leading accuracy. As it requires swipe typing data for a given layout and language for training, we only release a QWERTY English decoder.

With all 3 models, we achieve a top-4 accuracy of xx% in the QWERTY English swipe typing case.

Footprint

The encoder model is just 635,140 parameters, and the decoder is 304,155 extra. The biggest one is the ContextLM at 1.5 million, but 1.1 million of that is just embeddings. This brings us to 1,364,271 active parameters, or 2,494,767 total parameters.

This means the footprint of the models are very small, and the model can run on low-end devices in milliseconds. In addition, the environmental costs involved in training the models were also very low, because we never needed more than 1 workstation GPU!

C++ Library

The models themselves are only half of the story when going from a swipe to word predictions. The model predictions are useless on their own and it's necessary to perform beam search over a dictionary to score a set of words and find the most likely candidates.

For this, we release executorch-inference, a library written in C++ that handles the entire inference, decoding, and beam search part so you can easily go from swipe path to word prediction.

Want to build with FUTO Swipe?

The FUTO Swipe models are available under the FUTO Model License, and the inference library is under MIT. You can also read the paper for more in-depth details.