Eat up Martha: Thoughts on Early LLMs
Early versions of touch screen devices had questionable utility. The Apple Newton and its handwriting recognition feature is probably the easiest of these to dunk on, as the Simpsons did.
For $900 in 1993 or about $1960 in 2024, you could own a note pad that sort of worked with a stylus and touch screen interface.
Of course, I dreamed of having one anyway. As a Star Trek The Next Generation fan, I wanted to walk around and work with a TNG like dataPADD. I eventually bought a used Pocket PC, which unsurprisingly wasn’t the answer to all my school note-taking and research needs. The user interface for touch screens and handwriting recognition was innovative but did not fulfill its potential for productivity yet.
In glorious ultra-high-definition hindsight, however, these early iterations of touch screen user interfaces heralded something that would truly change our lives.
When working with today’s large language models (LLMs), such as ChatGPT, Gemini, and Perplexity, I can’t help but find parallels with their language interface and those early handwriting interfaces.
While studying Linguistics in college, I sometimes wondered what if it were a core subject such as Math, History, and English, or even an advanced subject of English, as Calculus is for Math.
Although I think it’d be good if everyone knew a few linguistic concepts, the real benefit would be the initial hurdle students have to make when they begin to systematically study language. This hurdle is our own intuitions about language.
Linguistics attempts to study language outside of these intuitions (Hayes, 2021). Since infancy, we began intuitively acquiring the complex grammars of our native language like our first baby steps.
But studying linguistic data in your native language or even any language you’re familiar with means you have to adhere to an analytical process. You have to maintain in this process that examples of a language you know are just data. It’s pretty common in a Linguistics course for a student to take a moment to understand the point of an example because to them an informal or inconsistent structure “sounded right”, for example, between you and I.
It kind of reminds me of the process of writing code versus reading and understanding that same code months later. It’s much more difficult to read and understand your own code months later than it was when you were writing it. We may know the grammar of our native language intuitively, but talking about its complex components in a descriptive and analytical way can give a person a headache.
Human Languages are complex, and if languages are complicated, it’s no wonder communication is complicated. We can run into real problems with it, but more often than not communication is good enough, and if we just navigate its limitations and issues, we should be okay, right?
Well, what happens when we use language, this human API, as it were, in a novel way, such as when interfacing with a verbal artificial intelligence?
The computer mouse is also an interface, a hardware interface, most often used together with a graphical user interface (GUI). It’s a brilliant design. It’s intuitive, almost familiar, but also, at the time of its invention, different and novel. I believe that was the reason for its success. Its usage is analogous not obstructive.
If a computer’s graphical interface is analogous to a physical desktop, which I can interact with by moving around and reading papers, books, maps, charts, etc., then it would be familiar and intuitive to interact with the computer screen in an analogous way.
A stylus with a special surface was an early attempt at this interaction.
A stylus was on Douglas Engelbart’s mind when at a computer graphics conference in 1963 he wrote in his notebook ( Bardini, 2000):
Separate possibility of a “bug” instead of a stylus. Bug being something that does not fall if you take hands off-just stay where you left it. Much better for coordination with the keyboard. Also easier (more natural space)
Engelbart and colleagues later renamed his invention the mouse, famously demonstrated in The Mother of All Demos in 1968.
Engelbart’s notes convey the simultaneous familiarity and novelty of the mouse design. I can use my hand to interact with it like a pencil or pen, and when used with a computer keyboard, it’s like the devices are in collaboration not in an awkward back and forth.
There’s something about the stylus used as a pointer device that’s too familiar and uncanny. You already know the kind of work you can do with a pen on paper, but there’s something off about looking up to the screen instead of down to the writing surface while using a stylus in this way. In attempting to be analogous, this type of stylus was obstructive. It doesn’t seem to want to coexist alongside of my experience with a writing instrument.
Chatbots evoke this same too familiar and uncanny sensation. The entire process seems like productivity. I know from experience that I can use language to communicate, and I will assume that this communication will be good enough most of the time. But is this happening most of the time with a chatbot or are my intuitions about language and my assumptions about communication interfering with my use of this interface?
I wanted to take a minute to say to all the present day electronic stylus users out there, no shade. I love the Apple pencil, for example. It works well and just like a writing instrument. It’s a pleasure to work with.
Anthropomorphizing AIs, and LLMs in particular, is a topic being discussed from several perspectives. Ethical risks is one you’ll see very often, and the AI Bill of Rights has been a touchstone for this discussion.
Personally, the issue with anthropomorphizing AIs that I’ve heard the most goes along the lines of “if you’re anthropomorphizing a chatbot, then you’re not using it right”. Even the article, Anthropomorphism in AI: hype and fallacy, which is mostly about ethical issues, states that anthropomorphizing “is also reductive because it asserts an out-of-place, bio-centric perspective that can overlook the unique potential of artificial systems” (Placani 2024). So words such as knowing, forgetting, learning, thinking, and recognizing shouldn’t be in your mind when you interface with an LLM, that is, when you write your prompts and read and react to chatbot responses.
No doubt there are some who have optimized a workflow with chatbots, just as there were users of the RAND Tablet pictured above who were able to get into a flow with it. But with chatbots we’re being asked to achieve this workflow by using our language in a familiar and yet uncanny way. Like a stylus as a pointer device, there’s something about it that’s unaccommodating and off. This is not obvious unless you consider the interaction outside of the interface like a linguist analyzing a language outside of her intuitions of it.
What’s happening between you and a chatbot is communication, and we have intuitions about communication acquired from childhood onward, but to use an LLM effectively it seems we’re also expected to set aside these intuitions like someone at a desktop computer with a stylus needing to remember to look up at the screen not down at the writing surface and then to awkwardly set down the stylus to work with the keyboard. Only this is our language and our main means of interfacing with other human beings that we’re being asked to modify, not computer hardware interfaces.
Circumventing our intuitions about language takes conscious effort, but what if we also need to adjust our core expectations of communication?
Software development has a unique relationship with the sunk cost fallacy. The Mythical Man-Month is a classic computer science book about the pitfalls of throwing more and more programmers at a struggling project. Technical debt, hotfixes, refactoring, resistance to refactoring are all issues that every software engineer has faced.
Something similar to those issues may be happening when working with an LLM for coding. It’s tempting not to throw away a mediocre but incredibly detailed answer entirely, and I wonder if instances like this balance out the time saved when receiving good quality answers. I find this feeling is intensified when using the same prompt with different LLMs and comparing the answers. Often each answer contains some helpful details that the other does not.
But I feel there’s a different sunk cost happening as well when using chatbots, one unique to LLMs and language. It has to do with our understanding of communication. Language is a good enough form of communication. To use a computer networking analogy, it’s sort of like UDP not TCP. It’s best effort. But we’re used to it being good enough most of the time.
Unless we’re willing to continue with the linguistic mental gymnastics mentioned above, this good enough most of the time assumption is going to kick in.
So then, when a conversation with a chatbot works and when it doesn’t work, what have we learned?
Historically, dead ends in software development often pay off in a residual value sense. Often you gain general domain knowledge that puts you in a position to see other potential solutions better. But to get to this benefit, as a therapist might say, you need to put in the work.
No doubt there are already some who have committed to a rubric and developed a workflow that yields more potential from verbal artificial systems. In my experience, if a conversation with a chatbot doesn’t yield results, I’m tempted to start from scratch on my own losing that time worked. Still others might decide whatever was provided is good enough, because that’s how we’ve communicated with other verbal beings our entire lives. Also, it’s tempting to press on and hard to resist anthropomorphizing the effort made to give us information, because that’s also an understood dynamic of conversations. How many times have you started a conversations with a chatbot with a please and ended it with a thank you?
But what I described above is still mixed results for an interface. Are these mixed user results good enough? Is there a better AI model and a better interface for us?
Or you might consider refining your prompt game with Logitech’s Logi AI Prompt Builder. Use an AI to talk better to an AI. No, it’s not an April Fool’s day product.
If you’ve worked with a chatbot more than a few times, the words quoted above are probably familiar to you. You might even like them. They’re a prelude to a stream of information provided just for you. But a breakdown has other meanings. One of many fascinating topics in linguistics is semantics.
We have an intuitive grasp of our native language, but language itself is complex, and using it to communicate, its main purpose, is also complicated. Must we adapt to utilize chatbots to their full potential, like an Apple Newton user slowly writing and rewriting their words, rather than intuitively using our language? That’s not convenient and not worth the cost to our understanding of communication.
Bardini, Thierry. Bootstrapping: Douglas Engelbart, Coevolution, and the Origins of Personal Computing. Stanford University Press. 2000.
Hayes, Bruce P. Introductory Linguistics. 2021.
Placani, A. “Anthropomorphism in AI: hype and fallacy”. AI Ethics 4, 691–698 (2024).