Interview: Larry Wall
|Perl 6 has been 15 years in the making, and is now due to be released at the end of this year. We speak to its creator to find out what’s going on.
Larry Wall is a fascinating man. He’s the creator of Perl, a programming language that’s widely regarded as the glue holding the internet together, and mocked by some as being a “write-only” language due to its density and liberal use of non-alphanumeric characters. Larry also has a background in linguistics, and is well known for delivering entertaining “State of the Onion” presentations about the future of Perl.
At FOSDEM 2015 in Brussels, we caught up with Larry to ask him why Perl 6 has taken so long (Perl 5 was released in 1994), how difficult it is to manage a project when everyone has strong opinions and pulling in different directions, and how his background in linguistics influenced the design of Perl from the start. Get ready for some intriguing diversions…
Linux Voice: You once had a plan to go and find an undocumented language somewhere in the world and create a written script for it, but you never had the opportunity to fulfil this plan. Is that something you’d like to go back and do now?
Larry Wall: You have to be kind of young to be able to carry that off! It’s actually a lot of hard work, and organisations that do these things don’t tend to take people in when they’re over a certain age. Partly this is down to health and vigour, but also because people are much better at picking up new languages when they’re younger, and you have to learn the language before making a script for it.
I started trying to teach myself Japanese about 10 years ago, and I could speak it quite well, because of my phonology and phonetics training – but it’s very hard for me to understand what anybody says. So I can go to Japan and ask for directions, but I can’t really understand the answers!
“With Perl 6, we found
some ways to make
the computer more sure
about what the user is
talking about.”
So usually learning a language well enough to develop a writing system, and to at least be conversational in the language, takes some period of years before you can get to the point where you can actually do literacy and start educating people on their own culture, as it were. And then you teach them to write about their own culture as well.
Of course, if you have language helpers – and we were told not to call them “language informants”, or everyone would think we were working for the CIA! – if you have these people, you can get them to come in and help you learn the foreign language. They are not teachers but there are ways of eliciting things from someone who’s not a language teacher – they can still teach you how to speak. They can take a stick and point to it and say “that’s a stick”, and drop it and say “the stick falls”. Then you start writing things down and systematising things.
The motivation that most people have, going out to these groups, is to translate the Bible into their languages. But that’s only one part of it; the other is also culture preservation. Missionaries get kind of a bad rep on that, because anthropologists think they should be left to sit their in their own culture. But somebody is probably going to change their culture anyway – it’s usually the army, or businesses coming in, like Coca Cola or the sewing machine people, or missionaries. And of those three, the missionaries are the least damaging, if they’re doing their job right.
LV: Many writing systems are based on existing scripts, and then you have invented ones like Greenlandic…
LW: The Cherokee invented their own just by copying letters, and they have no mapping much to what we think of letters, and it’s fairly arbitrary in that sense. It just has to represent how the people themselves think of the language, and sufficiently well to communicate. Often there will be variations on Western orthography, using characters from Latin where possible. Tonal languages have to mark the tones somehow, by accents or by numbers.
As soon as you start leaning towards a phoenetic or phonological representation, then you also start to lose dialectical differences – or you have to write the dialectal differences. Or you have conventional spelling like we have in English, but pronunciation that doesn’t really match it.
LV: When you started working on Perl, what did you take from your background in linguistics that made you think: “this is really important in a programming language”?
LW: I thought a lot about how people use languages. In real languages, you have a system of nouns and verbs and adjectives, and you kind of know which words are which type. And in real natural languages, you have a lot of instances of shoving one word into a different slot. The linguistic theory I studied was called tagmemics, and it accounts for how this works in a natural language – that you could have something that you think of as a noun, but you can verb it, and people do that all time.
You can pretty much shove anything in any slot, and you can communicate. One of my favourite examples is shoving an entire sentence in as an adjective. The sentence goes like this: “I don’t like your I-can-use-anything-as-an-adjective attitude”!
So natural language is very flexible this way because you have a very intelligent listener – or at least, compared with a computer – who you can rely on to figure out what you must have meant, in case of ambiguity. Of course, in a computer language you have to manage the ambiguity much more closely.
Arguably in Perl 1 through to 5 we didn’t manage it quite adequately enough. Sometimes the computer was confused when it really shouldn’t be. With Perl 6, we discovered some ways to make the computer more sure about what the user is talking about, even if the user is confused about whether something is really a string or a number. The computer knows the exact type of it. We figured out ways of having stronger typing internally but still have the allomorphic “you can use this as that” idea.
LV: For a long time Perl was seen as the “glue” language of the internet, for fitting bits and pieces together. Do you see Perl 6 as a release to satisfy the needs of existing users, or as a way to bring in new people, and bring about a resurgence in the language?
LW: The initial intent was to make a better Perl for Perl programmers. But as we looked at the some of the inadequacies of Perl 5, it became apparent that if we fixed these inadequacies, Perl 6 would be more applicable, as I mentioned in my talk – like how J. R. R. Tolkien talked about applicability [see http://tinyurl.com/nhpr8g2].
The idea that “easy things should be easy and hard things should be possible” goes way back, to the boundary between Perl 2 and Perl 3. In Perl 2, we couldn’t handle binary data or embedded nulls – it was just C-style strings. I said then that “Perl is just a text processing language – you don’t need those things in a text processing language”.
But it occurred to me at the time that there were a large number of problems that were mostly text, and had a little bit of binary data in them – network addresses and things like that. You use binary data to open the socket but then text to process it. So the applicability of the language more than doubled by making it possible to handle binary data.
That began a trade-off about what things should be easy in a language. Nowadays we have a principle in Perl, and we stole the phrase Huffman coding for it, from the bit encoding system where you have different sizes for characters. Common characters are encoded in a fewer number of bits, and rarer characters are encoded in more bits.
“There had to be a
very careful balancing
act. There were just so
many good ideas at
the beginning.”
We stole that idea as a general principle for Perl, for things that are commonly used, or when you have to type them very often – the common things need to be shorter or more succinct. Another bit of that, however, is that they’re allowed to be more irregular. In natural language, it’s actually the most commonly used verbs that tend to be the most irregular.
And there’s a reason for that, because you need more differentiation of them. One of my favourite books is called The Search for the Perfect Language by Umberto Eco, and it’s not about computer languages; it’s about philosophical languages, and the whole idea that maybe some ancient language was the perfect language and we should get back to it.
All of those languages make the mistake of thinking that similar things should always be encoded similarly. But that’s not how you communicate. If you have a bunch of barnyard animals, and they all have related names, and you say “Go out and kill the Blerfoo”, but you really wanted them to kill the Blerfee, you might get a cow killed when you want a chicken killed.
So in realms like that it’s actually better to differentiate the words, for more redundancy in the communication channel. The common words need to have more of that differentiation. It’s all about communicating efficiently, and then there’s also this idea of self-clocking codes. If you look at a UPC label on a product – a barcode – that’s actually a self-clocking code where each pair of bars and spaces is always in a unit of seven columns wide. You rely on that – you know the width of the bars will always add up to that. So it’s self-clocking.
There are other self-clocking codes used in electronics. In the old transmission serial protocols there were stop and start bits so you could keep things synced up. Natural languages also do this. For instance, in the writing of Japanese, they don’t use spaces. Because the way they write it, they will have a Kanji character from Chinese at the head of each phrase, and then the endings are written in the a syllabary.
LV: Hiragana, right?
LW: Yes, Hiragana. So naturally the head of each phrase really stands out with this system. Similarly, in ancient Greek, most of the verbs were declined or conjugated. So they had standard endings were sort-of a clocking mechanism. Spaces were optional in their writing system as well – it was a more modern invention to put the spaces in.
So similarly in computer languages, there’s value in having a self-clocking code. We rely on this heavily in Perl, and even more heavily in Perl 6 than in previous releases. The idea that when you’re parsing an expression, you’re either expecting a term or an infix operator. When you’re expecting a term you might also get a prefix operator – that’s kind-of in the same expectation slot – and when you’re expecting an infix you might also get a postfix for the previous term.
But it flips back and forth. And if the compiler actually knows which it is expecting, you can overload those a little bit, and Perl does this. So a slash when it’s expecting a term will introduce a regular expression, whereas a slash when you’re expecting an infix will be division. On the other hand, we don’t want to overload everything, because then you lose the self-clocking redundancy.
Most of our best error messages, for syntax errors, actually come out of noticing that you have two terms in a row. And then we try to figure out why there are two terms in a row – “oh, you must have left a semicolon out on the previous line”. So we can produce much better error messages than the more ad-hoc parsers.
LV: Why has Perl 6 taken fifteen years? It must be hard overseeing a language when everyone has different opinions about things, and there’s not always the right way to do things, and the wrong way.
LW: There had to be a very careful balancing act. There were just so many good ideas at the beginning – well, I don’t want to say they were all good ideas. There were so many pain points, like there were 361 RFCs [feature proposal documents] when I expected maybe 20. We had to sit back and actually look at them all, and ignore the proposed solutions, because they were all over the map and all had tunnel vision. Each one many have just changed one thing, but if we had done them all, it would’ve been a complete mess.
So we had to re-rationalise based on how people were actually hurting when they tried to use Perl 5. We started to look at the unifying, underlying ideas. Many of these RFCs were based on the fact that we had an inadequate type system. By introducing a more coherent type system we could fix many problems in a sane fashion and a cohesive fashion.
And we started noticing other ways how we could unify the featuresets and start reusing ideas in different areas. Not necessarily that they were the same thing underneath. We have a standard way of writing pairs – well, two ways in Perl! But the way of writing pairs with a colon could also be reused for radix notation, or for literal numbers in any base. It could also be used for various alternative forms of quoting. We say in Perl that it’s “strangely consistent”.
“People who made early
implementations of Perl 6
came back to me, cap in
hand, and said “We really
need a language designer.””
Similar ideas pop up, and you say “I’m already familiar with how that syntax works, but I see it’s being used for something else”. So it took some unity of vision to find these unifications. People who had the various ideas and made early implementations of Perl 6 came back to me, cap-in-hand, and said “We really need a language designer. Could you be our benevolent dictator?”
So I was the language designer, but I was almost explicitly told: “Stay out of the implementation! We saw what you did made out of Perl 5, and we don’t like it!” It was really funny because the innards of the new implementation started looking a whole lot like Perl 5 inside, and maybe that’s why some of the early implementations didn’t work well.
Because we were still feeling our way into the whole design, the implementations made a lot of assumptions about what VM should do and shouldn’t do, so we ended up with something like an object oriented assembly language. That sort of problem was fairly pervasive at the beginning. Then the Pugs guys came along and said “Let’s use Haskell, because it makes you think very clearly about what you’re doing. Let’s use it to clarify our semantic model underneath.”
So we nailed down some of those semantic models, but more importantly, we started building the test suite at that point, to be consistent with those semantic models. Then after that, the Parrot VM continued developing, and then another implementation, Niecza, came along and it was based on .NET. It was by a young fellow who was very smart and implemented a large subset of Perl 6, but he was kind of a loner, didn’t really figure out a way to get other people involved in his project.
At the same time the Parrot project was getting too big for anyone to really manage it inside, and very difficult to refactor. At that point the fellows working on Rakudo decided that we probably needed to be on more platforms than just the Parrot VM. So they invented a portability layer called NQP which stands for “Not Quite Perl”. They ported it to first of all run on the JVM (Java Virtual Machine), and while they were doing that they were also secretly working on a new VM called MoarVM. That became public a little over a year ago.
Both MoarVM and JVM run a pretty much equivalent set of regression tests – Parrot is kind-of trailing back in some areas. So that has been very good to flush out VM-specific assumptions, and we’re starting to think about NQP targeting other things. There was a Google Summer of Code project year to target NQP to JavaScript, and that might fit right in, because MoarVM also uses Node.js for much of its more mundane processing.
We probably need to concentrate on MoarVM for the rest of this year, until we actually define 6.0, and then the rest will catch up.
LV: Last year in the UK, the government kicked off the Year of Code, an attempt to get young people interested in programming. There are lots of opinions about how this should be done – like whether you should teach low-level languages at the start, so that people really understand memory usage, or a high-level language. What’s your take on that?
LW: Up until now, the Python community has done a much better job of getting into the lower levels of education than we have. We’d like to do something in that space too, and that’s partly why we have the butterfly logo, because it’s going to be appealing to seven year old girls!
But we do think that Perl 6 will be learnable as a first language. A number of people have surprised us by learning Perl 5 as their first language. And you know, there are a number of fairly powerful concepts even in Perl 5, like closures, lexical scoping, and features you generally get from functional programming. Even more so in Perl 6.
“Until now, the Python
community has done a
much better job of
getting into the lower
levels of education.”
Part of the reason the Perl 6 has taken so long is that we have around 50 different principles we try to stick to, and in language design you’re end up juggling everything and saying “what’s really the most important principle here”? There has been a lot of discussion about a lot of different things. Sometimes we commit to a decision, work with it for a while, and then realise it wasn’t quite the right decision.
We didn’t design or specify pretty much anything about concurrent programming until someone came along who was smart enough about it and knew what the different trade-offs were, and that’s Jonathan Worthington. He has blended together ideas from other languages like Go and C#, with concurrent primitives that compose well. Composability is important in the rest of the language.
There are an awful lot of concurrent and parallel programming systems that don’t compose well – like threads and locks, and there have been lots of ways to do it poorly. So in one sense, it’s been worth waiting this extra time to see some of these languages like Go and C# develop really good high-level primitives – that’s sort of a contradiction in terms – that compose well.
“why Perl 6 has taken so long (Perl 5 was released in 1994)”
Stating Perl 5 was released in 1994 in the same sentence as Perl 6 took a while is being really disingenuous to both projects. Perl 5.22 was release June 1, 2015 and is very different to the first Perl 5 release. Perl 6 has also not been in development since 1994, especially not the implementation around today.
“Perl 6 has also not been in development since 1994”
Yes, which is why we said in the article: “Perl 6 has been 15 years in the making”. Development of the design began in 2000 at the Perl Conference.
“disingenous to” is ungrammatical nonsense. And you are baselessly imputing bad faith. and note that LW had an opportunity to challenge that, but didn’t … perhaps because there’s simply no substance to your complaint. In fact, I would say that your complaint and your rationalizations are disingenuous.
“Perl 6 has also not been in development since 1994”
Completely and utterly irrelevant. Note that the question was “Why has Perl 6 taken fifteen years?” … 1994 is not 15 years ago.
Coming at computer language design from different perspectives is an excellent approach, which is why perl is such an interesting language. Larry Wall is without doubt a very cool guy, and reading his tangents – which find themselves back at the maim point – is refreshing to say the least. He’s a man that’s clearly more well rounded in his knowledge and outlook than many single focus computer science geniuses out there, and I would rather have a few pints with him than any of them.
Great interview, thoroughly enjoyed.
You’re making sweeping generalizations about people you know nothing about. One can compliment Larry Wall without tearing others down.
I find it ironic that you counsel this gent on not tearing others down when that’s all I’ve seen you do with every post here. Regardless of the merit of your arguments you’re coming across as some kind of arrogant, self-appointed truth police.
When you grow up you might find that not everything statement, false or not, needs to be responded to. Many here can filter this stuff out ourselves, we don’t need you to do it.
What you’re accomplishing is the discouragement of any discussion at all because most people don’t have the energy or desire to tangle with some nit-picking narcissist that’s probably spewed more than a few untruths in his day as well.
“We probably need to concentrate on MoarVM for the rest of this year, until we actually define 6.0, and then the rest will catch up.”
Hoo-boy.
I spent 13 years coding in Perl 5 variants. I enjoyed it, but never believed for a second that a real Perl 6 would come out. Some in the community have suggested that one of the upcoming releases of Perl 5 be termed ‘Perl 7’, and just be on with it from there. Not a bad idea, considering that Perl 5 is a perfectly good programming language.
I have seen the hype that says “this is the year” for Perl 6 more than once. I’ve got no more reason to believe it this time around than any other.
You’ve ‘seen hype’ about Perl 6’s imminent release but never from the mouth of Larry Wall. I was at the perl 6 talks at FOSDEM this year, they were all pretty clear that it was 8-10 months more work and 6.0 would be released.
The modern perl6 is as different from 2001 perl6 spec as perl 5.0 is from perl 5.22. They really should have called it ‘perl++’ or something instead of ‘perl6’, but, eh, whatever.
I don’t know PERL. In fact, I’m not a programmer, but I do need to write scripts every once in a while. But I was impressed with the “Huffman” idea of most used language elements being shorter.
Languages proponents forget that we all have things to get done, and could care less of elegance, especially if excessively verbose. For instance MS PowerShell, I hate it! It’s as if MS comp lang geeks went to Europe and visited every tiny university and stuffed all their wacky languages into what is supposed to be a script!. My favorite kind of scripting is to try and write 1 line UNIX jobbies. I should probably learn PERL if I get further into sysadmin stuff.
The notion that PowerShell represents academic language design is bizarre … especially coming from a non-programmer who doesn’t know Perl and is contrasting it with PowerShell.
I was of that opinion too, coming from a background of UNIX programming in a Pathology laboratory setting. Perl ruled the roost. We used to say that XYZ lab was built on Perl. Then I moved labs and worked at a Windows shop, and I could get a lot done with Strawberry Perl, but no-one there knew Perl and refused to learn it. It is arcane to Windows programmers. So for the sake of the team looking after the lab software I switched to Powershell. It uses the same pipeline style as UNIX and Perl, and although the commands seem long winded, the TAB button comes in handy for auto-complete. I can now achieve what I thought was pretty cool in perl, on Windows, and everyone in the team can understand it. Powershell is very similar to Perl in many ways, and ironically Windows people don’t really understand that it is just a more verbose (and probably easier to learn) rip off of Perl.
I’ve been writing Perl since v4, just before the v5 release – it’s my Swiss Army knife. I’m using C# and JavaScript mostly these days, but today I’m updating a Perl database load script.
Don’t know if I’ll get a chance to learn and use Perl 6 before I retire – hope so. Thanks Larry, for Perl 5, and thank you both for the update!
This is all very sad to me.
I have been a Perl programmer for many years. Have written a few cpan modules myself.
The community is totally languished. Most cpan modules are stale. Going to cpan is like walking the grand halls of a once lively kingdom and admiring the incredibly meticulous artwork.
I put Perl in the camp of cobol now. Tons of production code to maintain and some small splinters of people trying to adopt best practices from other languages into Perl. But basically a zombie.
I’m really annoyed.
I’m really annoyed because I think this all comes down to a lack of unity and leadership.
Patrick.
Patrick, don’t be annoyed [actually, my guess is you were just trying to provoke a heartfelt response]. My experiences using Perl 5 (modern Perl), and now Rakudo Perl6 (yes, anyone can use it today) are altogether more positive than the impression you give to people. Perl6 is still a lot like Perl5, you can still hack with it, but it has a much cleaner syntax, and has a lot more functionality should you need it. With the advent of Inline::Perl, a Perl5 module from CPAN can easily be run from within a Perl6 script, thereby bridging that particular gap between the two sister languages. In my view, it’s an exciting time for the Perl community, and I’d like to commend the dedicated team of very bright individuals, led by Larry Wall, on their brilliant work done for the greater good.
Great article. I loved the recursive caching feature which made my perl5 program run 30times faster in perl6. Unfortunately I implemented the same caching layer in my perl5 program and it ran even faster. But the sad point was that it was even faster in python.
It may be too late for Perl version 6. It appears that Ruby, Scala, Python, Javascript (in the form of Node.js) and Caml have far more momentum than Perl. Heck, even PHP and Java have more momentum (than Perl) notwithstanding that they are both lousy languages, at least as compared to Ruby. I hope that Perl can be revivified, but I am not holding my breath.
“And of those three, the missionaries are the least damaging, if they’re doing their job right.”
That is one of the most ABSURD things ever said. Most of these cultures already have their own religion. It’s because theirs is not good enough.
I’m someone of African ascendance who wasn’t reared in the U.S. and if there is one thing that is detrimental to identity it’s religions which make people feel back and hate themselves and how they look are are perceived.
BK.
‘s/feel back/feel bad/’