Correcting Text in AAC (or a failed computer scientist figures out how to correct text)
Cooking Up an AAC Correction Tool: A Recipe for Readability
Alright, gather âround, aspiring communication chefs! Weâre about to embark on a culinary adventure â one where we cook up something genuinely useful for the world of Augmentative and Alternative Communication (AAC). Ahem - I just LOVE an analogy.. I may have taken this to far.. but just go with it for now.. All code is at https://github.com/AceCentre/Correct-A-Sentence/tree/main/helper-scripts - I wrote this actually some time ago. Some of the data maybe a little but outdated but I figured it maybe of interest to someone out there
Now, before we don our aprons, a little confession: I flunked computer science at uni. Like, properly failed the module. So, if I can bumble my way through making a âFix Toolâ from scratch (and hey, big shout-out to the Grid3 âFix Toolâfor showing us whatâs possible â long may it reign!), then you absolutely can too. Consider me your fellow learner, not some all-knowing NLP wizard. Letâs learn by doing, shall we?
The Core Ingredient: Why Are We Even Doing This?
Think of it like this: AAC systems - Theyâre amazing, but sometimes, theyâre not quite as zippy as weâd like. We need these systems to be faster and better quality output. This isnât just my hunch; itâs what we hear from from our staff, clients, and years of digging into project feedback. We need efficiency.
What does that even mean in a world of AAC? Well, it could be speedier services, better quality support, or even device improvements like smarter input detection. But for todayâs special, weâre focusing on the juiciest bit: Output. We want to churn out text â or âutterances,â if youâre feeling fancy â as quickly and clearly as possible.
The snag? Sometimes, the input data is, shall we say, a little ânoisy.â This ânoiseâ comes from typing fast, or when folks face access challenges. So, how do we clean up that messy input and support someone? My secret ingredient, the one Iâve been shouting from the rooftops for ages, is Auto-correction. A sidenote: I donât mean auto-prediction here - I mean correction. Typing on those small keys on your phone. Its done by working out what you could have meant to say and nattily correcting it before you even see the text go into the writing area. When you see it enter wrong you think its bad - but when it doesnt you assume its your excellent writing skils.. Truth is its happening all the time.. But why isnât it standard in AAC where mistakes are equally high?!
Step 1: Letâs Bake Our Own! (Seriously, itâs fun!)
Forget those fancy, pre-made mixes you just microwave. Weâre going old-school, from scratch! (Seriously, donât just ask an LLM to whip this up. You learn SO much by getting your hands dirty).
If you were to create your own AAC correction tool, youâve got options. But first, letâs peek into our cupboard and see what kind of ânoisyâ sentences weâre often dealing with. These are our raw ingredients:
Ihaveaapaininmmydneck (No spaces AND typos â a double whammy!)
I wanaat a cuupoof te pls (Double presses â sticky keys, anyone?)
u want a oakey of cjeese please (Positional errors â hitting nearby keys)
can u brus my air (Deletions â vanishing letters!)
Can you help me? Can you help me? (Repeats â when a stored message goes wild!)
Generally, these sentences are short and sweet. Sometimes, folks use word prediction, and boom! Perfect sentence. But we all know word prediction can be a bit of a brain drain, right? All that visual scanning and mental parsing? Not always the most helpful, generally speaking (No seriously now - there is good research that prediction hasnt been that helpful - but we DO lnow of people who use it it a lot.. one key thing is that if you do want prediction keep the visual load by keeping it to three cells[^1])
The Toughest Nut to Crack: Writingwitjoutspacesandtypos
This first one is the making a meringue of our baking challenge. How do you find where one word ends and another begins when theyâre all smushed together?
Enter Python and Grant Jenksâ awesome trie-based wordsegment library. This clever little tool uses âunigramsâ and âbigramsâ (think of them as single words and two-word phrases, each with a special âweightâ based on how common they are) to figure things out. Hereâs a taste:
from wordsegment import load, segment
load()
segment(âthisisatestâ)
# Output: [âthisâ, âisâ, âaâ, âtestâ]
Pretty neat, right? But hereâs the rub: language is super personal. Your favourite biscuit might be a âbiccie,â but if âbiccieâ isnât in wordsegmentâs dictionary, itâs going to struggle. For example, Iwouldlikeabicciewithmycuppatea might come out as ['i', 'would', 'like', 'a', 'bi', 'ccie', 'with', 'my', 'cuppa', 'tea'].
You can still kinda figure it out, right? And hereâs a crucial baking tip: SOMETIMES, âGOOD ENOUGHâ IS PERFECTLY FINE! Remember most people arent commuicating to nobody - they generally have a communication partner and co-construction is everything in AAC. We donât often need to do it ALL. A little help is often all thatâs needed. But what if we want to perfect our pastry?
Adding Our Secret Family Ingredients (Personalization!)
What if we knew a personâs actual vocabulary? (Big âif,â I know!) We could sprinkle their unique words and phrases into wordsegmentâs data. Imagine taking an end-userâs real-life text and using it to train the tool. Itâd be like using your grandmaâs secret spice blend!
Check out the docs on customizing the corpus and our own code here and here. Fun fact: we even made typo-heavy bigrams based on real typos! This is a solid start, though capturing every personal nuance is a challenge.
We could also try a âfuzzyâ approach (like this blog suggests). Itâs light on memory and doesnât need a super-fancy GPU, which is great for on-device use. It might not be lightning-fast, but perhaps âquick enough.â Still, throw in acronyms, names, and weird abbreviations, and youâve got yourself a truly tough cooking challenge!
The Fancy Pre-Made Sauce: Large Language Models (LLMs)
So, what about the new kid on the block, the Large Language Model? Letâs try wrapping our noisy sentences in an OpenAI GPT-3.5 Turbo bow and see what happens:
https://www.youtube.com/watch?v=ExCukoFpaWY&list=PLWWQ5nlUD_ttgsAsjaUAyJz7GfFqOCePR&index=1
Not bad, right?! You can instantly see the magic: focus on typing, less visual distraction from predictions, maybe even lower dwell times (0.5 seconds in that video is practically unheard of!). This could be a game-changer!
But hereâs the colossal snag, the bitter aftertaste: All that super-personal data? Itâs flying off to the cloud, potentially helping train a commercial entityâs models. And when weâre talking about someoneâs unique communication, âpersonalâ means really personal. So, while the pre-made sauce is convenient, it just doesnât sit right for privacy. We need something thatâs:
Efficient (Quick to cook!)
Privacy-focused (All in your kitchen, no external diners!)
AAC-Like text savvy (Understands our unique culinary style!)
The Holy Grail: Baking Our Own Language Model! (Woohoo!)
Okay, deep breaths. This sounds scarier than it is, I promise! To bake our very own language model, we need some fundamental ingredients:
Data. A whole heap of raw, noisy strings that looks like your real user input, plus the perfectly âcleanedâ version it should become. The closer to your real sentences your users actually write, the better.
No, seriously, a LOT more Data.
Yep. Keep on going. More please.
A little bit of code (the ârecipe cardâ) to âtrainâ our system. Letâs not get bogged down in the fancy whisking techniques just yet. Lets just say you need a good food mixer!
Time, a bit of cash for the oven (aka computer somewhere), and a tiny pang of guilt about the electricity bill (/burning of the planet)
You get the picture: DATA is our main ingredient. But hereâs the kicker: thereâs barely any âAAC user dialogueâ data out there. What is âAAC-Like text,â anyway?! Honestly, itâs a bit of a shocker that our industry hasnât dug deeper into how people actually use AAC. Do they really use stored phrases all the time? What kind of language do they create? Weâre mostly just guessing! (Im being purposefully harsh here - there is some great studies but genuinely not large datasets).
So, we have to âimagineâ what AAC-like data looks like. Keith Vertanen made a fantastic attempt with his artificial dataset â kudos to him! We feel âspokenâ corpora might be a good starting point, since AAC is essentially about âspeakingâ through a device. Letâs check out these corpora:
The catch? These are mostly perfectly transcribed texts. Not many glorious typos, which is what our grammar correction tool needs to learn from. We need truly ânoisyâ input. So how are we going to do that?
Injecting Chaos: Making Our Data Deliberately Messy
So, how do we take these pristine datasets and get them messy? We could artificially create the noise using a tool like nlpaug, designed for exactly this. But how ârealâ are these artificial typos? We donât really know.
Ideally, weâd have real typos from the wild! Like those found here, or the TOEFL Spell dataset (essays by English language learners â goldmine!). And what about homonyms for those tricky folks who spell words like âtheyâre,â âtheir,â and âthereâ interchangeably? This list could be a good starter for ten. (Quick disclaimer: Not everyone does this, but it does happen, see what I just did there?).
So, for our Model Cooking Session, we now have:
A jug of spoken sentences â some homemade, some store-bought (! Im getting too carried away with this analogy right? I mean real utterances and some augmented like AAC Text)
A handful of real spelling errors.
And a good well made grinder to inject those errors right into our mixture!
Weâll take our pristine corpora and programmatically make them gloriously noisy by swapping in words from our typo datasets. Yum! (aka you need to write some code to actually do this.. Sadly this is where it isnt really like cooking. You do need to write some code..)
The Grammar Garnish: Adding Finesse
Now, what about grammar? Iâm going out on a limb here, but a lot of AAC written text tends to have âpoorâ grammar. (Quick note: Itâs not really âpoorâ â what matters is communicating! But technically, better grammar can help our text-to-speech systems sound more natural and grammar can fix what looks like misunderstandle sentences). Well like earlier we could take our beautiful ingredients (our copora) and strip out things like commas and apostrophes (which AAC users rarely include). But we could do with some better âbaseâ material:
JFLEG Data: A respected source for grammar models. Sentences arenât very âAAC-like,â but itâs a good foundation.
C4-200M: A staggering 200 million grammatically wrong sentences, corrected! Itâs massive, so a smaller subset like this one is more manageable.
So, the recipe is coming together! Hereâs how weâll prepare our training data:
Take each spoken text corpus.
Inject typos into the training data and strip out grammar elements.
Create three versions of each sentence:
With typos and compressed (no spaces).
As it should be, but still with no spaces.
Also, add a grammar baseline training layer.
Then, weâll bake it all using a clever âtext-to-text transformerâ model. We need something pretty lean, remember, because it needs to run efficiently on a device!
Time to Fire Up the Oven!
Want to try this yourself? Grab the script for preparing the data here (youâll need to download the BNC2014 corpus and place it correctly).
Itâs a lengthy process, full of detailed steps for adding typos, cleaning, and prepping the text. Remember that old chefâs adage: A good model is only as good as its ingredients! Hereâs a snapshot of our data pantry (NB: This is a bit out of date - but you get the idea):
Data Source Average Word Length/Sentence Sentence Count Notes AAC Text 7.72 1,504 Artificially typoâd, 1,506 compressed variants BNC & Daily Dialog 9.71 885,186 Real typoâd, 889,546 compressed variants JFLEG Data 16.70 754 C4 Subset Data 9.76 41,394 Overall 9.4 1,914,226 File size: 196.78 MB, Split: 80/20 Train/Eval
Training itself is surprisingly straightforward if you have a decent GPU. Our batch took about 3-6 hours to cook (evaluation alone took two hours!). The delicious result? A model you can find here on Hugging Face.
The Taste Test: How Did Our Bake Perform?
Letâs see how our different âcorrection techniquesâ stack up on 39 test sentences (compressed and with typos):
The âalgorithimicâ approach Wordsegment + Spelling Engine: Around 14 seconds. Not bad for a quick fix! BUT it wont deal with anything that ISNT in its dictionary. So we can get this working in a test envrionment but we know it will fail with anything novel
Online Model: Azure/OpenAI GPT Turbo 16K: An astonishing 13 seconds! This is mind-blowing. Itâs an HTTP call all the way to the cloud and back, yet itâs that fast. How?!
Ok so now, for our homemade models (This is different versions of our âcakeâ that I made):
Method Accuracy (%) Total Time (seconds) Average Similarity Inbuilt 0.0 17.32 0.93 GPT 55.56 13.29 0.92 Happy 28.95 N/A N/A Happy Base 13.16 N/A N/A Happy T5 Small 0.0 N/A N/A Happy C4 Small 0.0 46.90 0.81 Happy Will Small 28.95 N/A N/A HappyWill N/A 24.67 0.93
Hereâs a little nibble of the results:
Incorrect Sentence Correct Sentence Output-Inbuilt Output-GPT Output-Happy Output-HappyBase Output-HappyT5 Output-HappyC4Small Output-HappyWill Feelingburntoutaftettodayhelp! Feeling burnt out after today, help! feeling burnt out aft et today help Feeling burnt out after today, help! Feeling burnt out today help! Feelingburntoutaftettodayhelp! Feelingburntoutaftettodayhelp! Feelingburntoutaftettoday help!! Feeling burnt out today help! Guesswhosingleagain! Guess who's single again! guess who single again Guess who's single again! Guess who single again! Guesswhosingle again! Grammatik: Guesswhosingleagain! Guesswhosingleagain!! Guess who single again! Youwontyoubelievewhatjusthappened! You won't you believe what just happened! you wont you believe what just happened You won't believe what just happened! You want you believe what just happened! You wouldn'tbelieve what just happened! Youwontyoubelievewhatjusthappened! Youwontyoubelievewhatjust happened!! You want you believe what just happened! Moviemarathonatmyplacethisweekend? Movie marathon at my place this weekend? movie marathon at my place this weekend Movie marathon at my place this weekend? Movie Marathon at my place this weekend? Movie marathon at my place this weekend? grammar Moviemarathonatmyplacethisweekend? Moviemarathonatmyplacethis weekend? Movie Marathon at my place this weekend? Needstudymotivationanyideas? Need study motivation, any ideas? need study motivation any ideas Need study motivation. Any ideas? Need study motivation any ideas! Need study motivationanyideas? Needstudymotivationanyideas? Needstudymotivationanyideas? Need study motivation any ideas! Sostressedaboutthispresentation! So stressed about this presentation! so stressed about this presentation So stressed about this presentation! So stressed about this presentation! So stressed about this presentation! Sostressedaboutthispresentation! Sostressedaboutthispresentation!! So stressed about this presentation! Finallyfinishedthatbookyourecommended! Finally finished that book you recommended! finally finished that book you recommended Finally finished that book you recommended! Finally finished that book you're recommended! Finally finished that book yourecommended! Finalfinishedthatbookyourecommended! Finally finished that bookyourecommended!! Finally finished that book you're recommended!
So, yeah! Pretty sweet, right? Weâre actually outperforming GPT â the fancy, paid online service â with our own home-baked model! That speed, though⌠and letâs not even mention the memory usage differences.
But hereâs something to chew on: the âinbuiltâ non-LLM technique (that Wordsegment one). Itâs surprisingly readable, and no memory issues to worry about!
The Next Dish: A Reality Check
Now, for a live demo of our creation (itâs like showing off your perfect soufflĂŠ!):
Watching this, youâll instantly spot a slight issue with our âtest kitchenâ data. Itâs actually not as messy or noisy as our real-world, imagined scenarios. Most test sentences only had a couple of typos, whereas our real AAC input can look like every word has had a wrestling match with the keyboard.
So, the next step in our culinary journey? We need to hunt down, or create, an even richer, noisier, and more authentic corpus of data. The quest for the perfect ingredients continues!
[^1]: Koester, H. H. & Levine, S. P. Learning and Performance of {Able-Bodied} Individuals Using Scanning Systems with and without Word Prediction. Assist. Technol. 6, 42â53 (1994).