How we can Engineer Language

js_oxford_MAY2.jpg

Using the paradigm of Cathedrals vs Bazaars, in this talk Adam Leskis focuses on the current state of language learning applications and how they tackle the problem by creating a few complex things (Ferraris) when a focus on creating lots of simple combinable things (Ladas) might be better.

Introduction

Using the paradigm of Cathedrals vs Bazaars, in this talk Adam Leskis focuses on the current state of language learning applications and how they tackle the problem by creating a few complex things (Ferraris) when a focus on creating lots of simple combinable things (Ladas) might be better.

This talk was part of JSOxford (May event).

[00:10:00] All right. Yes, I’m Adam. I work at Elsevier now, which is awesome because it’s actually my first job in the tech industry, thanks in large part to JSOxford and the community here. Very welcoming, very helping. Give yourselves a hand, give yourselves a hand. Yes, everything is awesome and this is my first ever talk at JSOxford. I gave a little micro-talk two times ago, I thought, but this should be the full shebang. Great. Lots of buzzwords here, we’ve got bizarre, we’re got Ladas, cathedrals, Ferraris. I’ll explain all that stuff later. Language learning apps, there are micro-services, I don’t really know why I put that in there. It sounded kind of good, maybe we can figure it out as I go along.

[00:01:01] Okay. I know what you’re probably thinking. Right. Who’s this guy? Right? That’s Christian Bale, actually. American Psycho. Who’s this guy, right? Adam, you probably see me around at JSOxford and I’m really loud, I like to dance, whatever. Okay. Or, this guy here. A French flag, I made that hat out of duct tape, I was supposed to be Napoleon. Or, this guy, remember, I was giving a talk, I had a hat on, it was awesome. This is actually who I really feel like I am. I’ve been, or I had been, past perfect, working as a teacher internationally in a couple of interesting places for about eight and a half years, but I’ve always been an engineer at heart. I started out as an engineer in my undergraduate but I found out I wasn’t very good at maths, so that torpedoed it pretty quickly. I’ve always like building things. Here, you see there are some hoses connected with duct tape to some funnels attached to the showers in our dorm bathroom that I hooked up to this thing here, which was a fully functioning, effectively a hot tub. It had an outflow pump that – well, not a pump – it was using pressure, physics, whatever, not a big deal to circulate the hot water, the cold water. I’ve always loved building things. I thought, “How can we engineer language learning teaching? I was to create something. I want to build something. Take something from in here and make it real.” Okay, maybe make it as cool as a train app. I probably can’t do that but we can always dream.

[00:02:36] The deal is, Google and others always working on the stuff with business implications. The stuff I’m talking about nobody really cares about that much because you really can’t sell it, really, really simple stuff. Google is working on machine translation, semantic analysis, that’s not what we’re talking about. Okay. Language learning students, with the exceptions of very high stakes testing. We’re talking about the IELTS testing, I know OUP does a lot of testing materials – that’s big business – but that’s not what I’m talking about. Okay. Some current approaches exist, we have Duolingo, which some of you may be familiar with, and also Memorize. I couldn’t get the word on that graphic, so it’s Memorize. I’ll talk about it and show it later. What I want to say at the outset is, I know some people, maybe even in this room, really love Duolingo and if you like Duolingo and it works, awesome. Use it, do whatever works. What I’m suggesting is more of an additive approach rather than something that should substitute for it. Again, if you like this stuff, if it works for you, awesome.

[00:03:40] Let’s take a look at it. All right. Duolingo claims, so we’ve 34 hours of Duolingo equals one university semester, 11 weeks. Well, that’s interesting. They’re claiming that. Well, let’s see, wait, fine print, this is the study that they did and the participants took a placement test at the beginning and at the end. Interesting. We’re obviously interested, what’s the test that they took? Is it a good test? Is it a bad test? Okay. It’s based on science, it’s a test. Thank goodness. What of this research? What of this test? I thought we could take a look at some of the greatest hits of the – I mean; it is independent study. I don’t know why I put that in quotes. It was an independent study but let’s check it out. The main instrument, it’s this WebCAPE. Basically, what this means is it’s a test and you can check out the specifics of it on this website, which I did. It was interesting to note that it did not evaluate a lot of language skills they acquired for the two months of study, so in that regard, recommend addition to the written placement test to include some test of spoken proficiency. All this research did not have a spoken proficiency component.That’s interesting, so they’re saying, “We teach you just like a university course, except no speaking”. Well, okay, that’s not exactly like a university course. Another thing, in 2008, there was a test for spoken proficiency but not in the test that they’re referencing on their page that looks very nice and pretty.

[00:05:13] That’s a big problematic. Okay? This is a sample of the questions on the test. Basically, we have sentence completion, what we’d call a “selection type item” because you already know what the possible answers are, so you just pick the one that goes in there, right? I don’t know if you can see that at the bottom there, that Carol doesn’t works full-time anymore, she teachers school until 12 o’clock and then she goes. Is that one words? Is that two words? It looks like there might actually be two errors in this sentence. Then the student has only, statistically, a 50% chance of guessing the right one even if they know they’re both errors. Tests usually have to have one right answer, so it’s easy to grade, so we don’t dispute grade or whatever. That’s an issue. This is the test upon which they’re basing the study, upon which they’re basing the claims of Duolingo.

[00:06:02] Interesting. Flag on the play. That’s an American football reference, this is from KNPL. They’re fantastic, you should check them out, lots of dancing. Hingle Mc Cringleberry on the right over there. Hold on, let’s take a time out. Quit primer on assessment. Super quick, an assessment that basically there are two things we care about. We care about content validity and construct validity. You can tell we care about them because they’re in red. Super, super fast, content validity means does it test what we learned or what we need to learn? Basically, is it authentic? Is it realistic? Something like this, maybe in class two plus two equals four. Then on the test, John had three apples, lost one, what’s the mass of the sun? Well, you didn’t teach us that. What? How am I supposed to figure that out? F, you’re terrible. I’m a terrible student, no. Or, something like teacher covered chapter three but the exam was all from chapter five. Well, that’s not fair, you didn’t teach us chapter five. We had a Spanish test, the reading had a bunch of vocabulary we didn’t learn. Well, that’s not fair because if you’re assessing vocabulary in the test, you need to teach the vocabulary.

[00:07:15] Right, study Java Applets but nobody uses those. Why did we study them? I’m awesome at Java Applets, cool, I don’t care. Right. Construct validity means does it test what we claim it’s testing or what we think it’s testing? For example, this is a notorious one, all the above, if it’s below none of the above then logically none of the above is included in all of the above. Then you turn it into this, “Can this student outsmart the test?” What you’re actually testing is critical and logical reasoning and not language ability. Very, very key. Some examples of this, as a speaking class but all we do is copy sentences off the board. You have written down three perfect sentences, you are ready to speak. Yes, well, no, I don’t think so. Or, I was supposed to teach us programming but the test was just questions about mathematical therms, underlying different algorithms. Which is interesting and applicable in some sense but there’s more to it. To do programming you need to do programming. If that’s all you’re learning, not exactly the whole story.

[00:08:20] Or, something like the international students, I heard this all the time where I used to teach, “International students, they’re really good at giving memorized speeches, why do they have so much trouble making conversation? I know they can speak English; I just saw them speak English for ten minutes.” Well, that’s a slightly different construct, it’s a slightly different target language use task. Okay, so let’s return to Duolingo with this idea of content validity and construct validity. Some tasty screenshots, that looks gorgeous. Way better than the stuff I’ve made. I like it, the colours, nice beautiful but wait a minute, did you already spot the problem? Let’s zoom in on it. Select the boy. If you are a human and you know what things are, you can pick the boy without speaking a word of Spanish. This one probably construct validity because you don’t need to know Spanish to answer this correctly. Also, nice beautiful flashy display, colours, we’ve got pictures, we’ve got buttons shaded on the bottom. I love it, but let’s zoom in on that. Translate it, “Por qué no comes fruta?” Well, let’s see, how would I say that in English? Why don’t eat vegetables. No, there’s something missing. It’s not possible to answer that from the selection type item. You can’t answer that correctly in standard English. Does that mean that you get a zero? Did you fail?That’s a big, big problem. Big problem. I took the placement test for Spanish. Nine percent fluent, put it on my CV. The trials I faced, thank you, gracias. Fantastic. I am now ready to go to Spain and type things into boxes.

[00:10:13] They weren’t all a cakewalk. I had to listen to this thing, el pingüino, come fruta? We had a problem though, almost correct, do you know what I was missing? I was missing the little umlaut over the u. Wait a minute. Is that a listening test or is that a spelling test? That’s problematic or it could be. Type what you hear. Ella no toca la carne. Fantastic. I did it perfectly, I’m so smart, but is that something that I need to learn how to say? She does not touch the meat. Okay. Okay, interesting. All right. Naturally, I wanted to see how they’d rate me as an English speaker. I feel pretty confident in my English. It’s American English but please forgive me that. This is – wow – this is really legit, really well-designed. 90%, maybe I can do it as well as that. Interesting. Wait a minute, wait a minute, wait a minute. It wants me to pay $20 for a certificate. I don’t know about that, but there’s a quick test so maybe I’ll just do that one.

[00:11:17] Okay, here we go. Literally, a third of the test was, choose which words are the real English words. Like you do all the time when you’re interacting with your friends. Another selection type item. Do you think I got them all? Well, you know that I didn’t. No, wainch. Oh my goodness. You could make an argument that that’s not standard English, so possibly the students shouldn’t be expected to learn that, but it’s to a larger discussion of who gets to define what a standard word is. Maybe the students want to learn about watching YouTube videos like this and they want to know a wainch is. Okay, no. Another one that I say all the time, the bishop has been opening the restaurant early. As he often does. Right? Very hard working bishop. Content validity. This one too. The entrepreneur drinks with the chairman. I don’t know. Content validity. This was the other third of the test, selecting words just like in real life. Have you ever had that experience where you go up to talk to one of your friends and they hand you a little tablet to select the words that you’re going to use to communicate with them? Now, you could argue that this is a good effective test, possibly of vocabulary and syntactic knowledge, but does it basically mean you are a competent speaker or a competent reader or a competent listener or a competent writer? Yes, literally, a third of the test. Again, another selection item. My goodness, 100%. I feel so happy with myself. Wait a minute, can understand virtually anything heard or read, even intellectually demanding. I’m ready to study at a UK university, right? My goodness. The thing is, so many times, and I’ve seen this not in my classes obviously, but some of the teachers that I mentored, they say, “We had the students listen to the listening. Yes.” I say, “Okay, and then what?” They say, “Well, no, no, they listened, so they listened.” I was like, “Yes, yes, yes, but we listen in order to do things. In order to listen to when it’s part of our conversation to talk or to think about something and respond appropriately or whatever.” Usually, you’re not just listening unless it’s for pleasure, which we don’t really need to assess anyway. Same thing with reading. Anyway, I’m an expert in English, put it on the CV Man, that CV is filling up.

[00:13:51] Duolingo has some issues, some issues. What about Memorize? That also looks very nice, very slick design, beautiful colours. Okay. We’ve got a little thing here and you wouldn’t see all three of these screens at the same time. Obviously, the fact that I can come and point to this, does this mean I now know culture in Chinese and if so, do I know when to use it appropriately, and can I hear it when somebody says it? There are so many more things involved in learning a word. What are the synonyms, what are the antonyms? What does it go together with? What verbs can take it as an object. There are some issues there. Yes, the chav slang. I forget what the answer to this was but it was something like, “[Inaudible 00:14:35]”, so I’m working on my chav slang. Again, a selection type item. I don’t need to remember any of these words because they’re presented to me. All I have to do is organize them, which can be useful, again, for syntactic knowledge and things like that but it’s not exactly construct validity because that’s not how we actually talk and communicate. Here, I was like, “What’s even being learned or tested?” If you just know the word for wine, well, it’s process of elimination, none of those other things have wine in them. Are you testing wine? Are you testing this Vudre, this wood?” Or, even if somebody said that, is a possible logical response, “You speak French very well.” Well, sure, I could think of a logical situation where that would – is it response? Is it translation? What is it? It’s a problem. It’s not all bad though, this actually was pretty nice. This was you listen and then you tap the translation for what you heard, which still not 100% authentic but we’re getting there.

[00:15:38] Duolingo, this I kind of liked. Duolingo would give me the sentence that I would read, okay, construct validity, and then I would say it in Spanish. Apparently, I did it very well. I was very pleased with myself. Again, very, very short sentences, not contextualized, a bit of an issue but we’re getting there. That’s a little bit better.

[00:15:39] To review, both apps are essentially flash cards. We’ve had flash cards for a long time. Quizlet is basically flash cards, which does have its uses. I’m not saying, “don’t use it, it sucks”. It’s awesome and it does have its uses but I think we can do more. They focus on language knowledge, very discrete things. What are translations, matching things, instead of language use. There are claims of teaching. Yes, this is teaching and this is a language. A bit misleading. Limited basic language usage. These could be done better. That’s what I’m going to pivot to. Okay, when are we going to get to the cathedral factory? Right, littlefrankyact.com, love it. Okay. That was the one with poochie, I think. Anyway, it doesn’t matter. There are essentially cathedral approaches to language learning apps. This references the cathedral versus bazars, which in essence it’s do we focus on building out a lot of features for something that’s really, really complex or do we just build one little feature in it’s open source, so we can just ship it when it’s ready. It doesn’t necessarily depend on connections to all these other dependencies and things like that. That’s how I read it, at least. I didn’t know JavaScript earlier during Joe’s presentation. I could be completely wrong. Okay.

[00:17:14] For the metaphorical purposes, Ferraris, really expensive things. Now, Duolingo, it is free but how long did it take people to create that? I’m assuming a lot of hours. What we need is Ladas, those really cheap Russian cars. Very cheap, dumb things. I’ll get to why they can be dumb because we’re going to – I don’t know – what do you think, Obama? Okay. Cool. I’m making metaphors great again. Yes, they try to do everything, they’re like, “We will teach you a language.” Well, there’s actually a lot of things involved in teaching a language. The claims are a bit overstated. From a psychometrics or assessment perspective, they do very little, or definitely not as big as they claim. It’s mostly knowledge comprehension based over engineered electronic versions of flash cards, and classroom. What I mean when I say “classrooms” is another big thing in tech now is not just the mooks but the, “We’ll go into either a chat room or a forum or something and the students will interact and they will learn language.” It’s like you’re just reproducing the classroom, when it takes humans interacting with humans. That’s good but can we move past that? We’ve had that for a while.

[00:18:29] We need pieces that are smaller, more modular, things that can be combined. Many people working on their own small piece. This is my vision of the future. I’ll elaborate in a little bit. We can achieve complex things, not necessarily teaching an entire language. That’s probably a bit beyond the scope of what I’m thinking immediately. Say, for example, writing a research essay in English. By accomplishing and putting together many, many very small pieces. Many tasks can be decomposed into smaller sub tasks, right? Yes, what do students need to do in order to write a research essay in English? They don’t need to select little drop-down buttons on a screen, no, it’s much more complicated. Let’s take a look. All right, choose relevant sources, take notes on the relevant sections of the relevant sources, choose which sections from the sources to combine as evidence, you don’t want all of them, maybe they’re not all relevant. Paraphrase the combinations because of course you don’t want to plagiarize. Synthesize the common element of each relevant section, so you could put together in one paraphrase. Vocabulary considerations, tone, formality, take notes on citation information so you can include it later. We haven’t even started writing yet. These are very, very, very complex tasks.

[00:19:40] They can be decomposed into a number of very, very small tasks and that’s what I’m going to focus on, is academic writing. Okay? This is not going to be all academic stuff but this is what I’m talking about. Look at those colours, so amazing. There are four of them, maybe five. Here, this is all this does, is returns actual Tweets, authentic language, so we do have content validity and it is selection items. Okay, okay, that’s fair, but you’re selecting which articles to put in the spaces. Very, very difficult thing to do in English is to choose the correct article, we’re saying, “A, an, and the”, okay? Very, very discrete tasks, very small task. Similar thing but taken from Reddit. Here we have the writing prompts from the subreddit and you click on one and it gives you something like this. Now, this is a bit more contextualized, so a bit more authentic, but still they are selection items. I wasn’t very, very happy with that. The live demo, no, live demo not finished. Well, it is finished but I didn’t want to focus on it because I thought that would take too much time. Let me just show you here.

[00:20:49] Here’s the same thing. We’ve got the writing, the writing prompts from Reddit. Now, here, you do know where the articles need to be put back in, the little asterisk. It is still a selection activity but what if we don’t have those and we make it a supply activity. What I’m arguing is this looks fundamentally 100% authentic from the point of view of the student. When they are writing their own academic essays, they don’t necessarily know where the problems are or what the problems are. They need to attune themselves, they need to become sensitive to looking for the markers that could tell them, “Maybe there’s a problem”. The markers generally tend to be the singular accountable nouns. They tend to cluster around those. What else could we do? How about auto generating materials? All that stuff, auto generated. The code just grabs it from Reddit and auto generates it. Really, really simple stuff. Fiction, that’s what I did, right. Fall tech’s research articles are difficult to get but Abstract’s, I’m getting those with archive or whatever you want to say. Those are easy to get and manipulate to simulate writing a research abstract because that’s what PHD students have to learn how to do. News articles, APIs, really common. Actually, both of those, their terms of service were not good so I ended up using BBC business, but very easy to get and manipulate. We’re not talking about map reduce and stuff like that. This is taking strings, taking out one little match and then manipulating. Very, very simply stuff.

[00:22:32] The problem is requiring a teacher or anybody, even a student, to create materials it takes time and usually money. If you’re relying on that, you can’t scale. For academic brain activities, we could highlight tenses by section, very, very easy to do, not computationally or technically difficult. You could highlight tone stance. That’s what Liam and I did with the fields or something. Is it positive or is it negative? How position? How negative? Students need to be attuned to this when they’re writing research articles, research essays because it can affect the meaning. Highlight features, I don’t know, anything. For example, if the student is using “because” to show a cause and effect relationship, every single time, even though that is not a grammatical issue and does not affect meaning, instructors will grade down for that and say, “This style, it just feels bad.” Even stylistic issues we could target. Maybe suggest instead of “because”, how about “in order to” or “for the purpose of” or whatever. Yes, super, super easy. I know they’re easy because I know how to do them, so I know they’re easy.

[00:23:36] Practicing revision, you like that British spelling of “practicing”? Know your audience, know your audience, man. Strip articles, that’s what I did. Change verb forms. You just find the motives and you change it, super easy. Highlight notes, like noun phrases. How hard do those look? They’re not difficult. We’re not talking about deep text analysis or machine learning or anything like that. You say, “That’s boring, I don’t get about text strings, let’s do some cool stuff.” Speaking practice, here’s where it gets exciting. Accelerometers in your phones have a predicted or sensitive to .1 millimetres, I think that’s the rating they can tell heartbeats. I’m wondering, if you put it to your throat, can they predict a voiced and unvoiced continence? I always use this with my students, you can touch the top of your head too. When you say a B sound, like a “b, b, b, b” it vibrates. When you say a P sound, “p, p, p, p” it doesn’t vibrate. Students, especially some of the… I don’t remember where they were from, but they had really big problems – I know, the Arabic students – traditionally have problems with the Ps and the Bs. They can stand there all day and I can say, “No, you’re not doing it right. No, you’re not doing it right.” That’s not fundamentally helping them. They need some sort of visualization, some feedback; where, one, they don’t need my interaction, they can do it themselves anywhere they want, and two, it’s a bit more reliable, instead of me saying, “No, no, no, no.” A teacher.

[00:24:56] Replay user audio artificially stressed voice contours. A lot of times, students have trouble and they speak in monotone voices because where they’re from, this signals politeness and respect and everything, but if you’re giving a talk like this, it could be perceived as maybe you’re not interested, you’re bored, you don’t want to be giving the talk. These functions of language use, “Why are you talking like that? Something’s weird, I don’t know.” Some students do want to work on this, they do need to work on this. I don’t know, I’m working on something like that, “Working on”. Measuring and – volume peaks. Another thing with stress is it’s higher and longer and louder. Can we measure that and if we can measure it, we can have the system give feedback on whether the student is predicting the expected patterns. Speak-speak revolution, that was an idea I had. The Koreans would be really into it, where you move with a fit-bit or something, like a dance-dance revolution. Anyway, something fun.

[00:25:55] Here’s where it gets way out there, my God. Visual feed pass based on a mouthpiece. Based on the place and articulation when you’re making the sounds. The Ls, we call it “a lateral approximate” because your tongue is touching here, the air goes around; whereas an R is a central approximate, your tongue touches on the side, the air goes through the middle. That’s very difficult for the student to see. It’s like, “No, I’m doing it. R, r, r, l, l” and I’m like, “I can’t – what?” If they can’t see it in me and if they can’t see it in themselves, well, how are they going to get feedback? If we had tongue position through some sort of device piped to a visual display on a phone, they could get real-time feedback anytime, instantly. Lorengio motion, a neckpiece, I don’t even know if this would work. It’s an idea I had for tones, higher tones. Your Adam’s apple goes up and down. Does anyone want to hack on this? I’d be super keen to do that. Whatever. That would be awesome.

[00:26:56] Things we are not trying to do. We’re not doing speech recognition; no. I don’t know how to do that. Google is working on it and that’s awesome, semantic analysis. Google is working on it, let them do it, communicatively with users. Past the touring test, I’m not trying to do that. We don’t need super sophisticated apps, we need very, very simple apps because a job of teachers is not super sophisticated, we mostly just point at things. Either we’re pointing at something the student is reading, like a material, like, “Look at this, look at this form here. Did you notice this? Why is this being used?” Or, we point at something the student has produced, “Why did you use this? Look, here, does this have an article? Does it need an article? Look here, look here.” We’re just pointing. I don’t do anything; the students do everything. If we can have a system do that for them, they don’t need me anymore and it’s way more stable.

[00:27:52] Possible issues, nobody is going to pay to develop that and nobody even knows if it works because I haven’t invented it yet, so we haven’t tested it. Across disciplinary understanding, coding, and language assessment. The issue is that from the technical side, I almost feel like this stuff is too simple and it’s not interesting for anybody. I get that, I definitely get that. Definitely, there isn’t a business case for the super, super simple stuff. An ecosystem maybe but not one at a time. From the teacher’s side, what I’ve seen, the cutting edge of technology and learning is, “Let’s make a PowerPoint presentation and make it a video and then put it on YouTube.” I’m like, “We can do more, we can do more.” I know how to do this. It’s going to take me a long time by myself, if anybody wants to hack on something with me, that would be super awesome. I’ve got business cards. We could do a little thing. It’s cool.

[00:28:42] What’s the endgame? We change the whole paradigm. We’re going to change everything. Everything. I don’t know. Maybe not everything. We’re going to move it into the hands of the users. Get some automated, realtime, scalable feedback. We’re going to auto generate these materials. Obviously, there will always be a place for teachers but we can also do things that maybe don’t demand the time of teachers so much, and provide instantaneous feedback. Like boom, boom, boom. Will it work? I have no idea. I don’t know. I’m going to find out though, because I’ll tell you what I’m not doing. I’m not going to wait around and just keep this like, “Business as usual”, we could do better. We could definitely do better and yes, I don’t know if any of this is going to… again, because I need to get the ideas out and show people and then maybe people will buy into it, maybe. It may fail titanically. Yes, Duolingo, Memorize, they’re just spokes in a wheel. Lannister, Stark Baratheon. I’m not going to stop the wheel; I’m going to break the wheel. Take that, Calmoro. Take that, Khal Drogo. Spoiler alert. Anyway, yes, we’re going to change the whole thing. That’s it. No code in presentations means don’t ask me about code. Ask Joe about code, anything else if fair game. That’s it.