Project Nayuki


Being a polyglot programmer

Introduction

Today I write program code in multiple languages – usually Java, JavaScript, Python, C, C++, and x86 assembly. This wasn’t always the case, and my first 5 years of programming experience were spent almost exclusively on Java. Although I did get exposed to other languages occasionally, it was only much later that I got into the habit of regularly programming in languages besides Java. Nowadays I choose the language for new projects based on convenience and requirements, and when I publish algorithms I go the extra mile to provide code in multiple languages for the reader’s convenience.

I feel that there are many benefits to being a multi-lingual (polyglot) programmer. Because not everyone has the time, circumstances, or motivation to be multi-lingual, I hope this article will raise awareness of the phenomenon and start discussions on the pros and cons of this way of working.

Overall observations and opinions

Adopting habits from stricter languages
  • In dynamically typed languages like Python and JavaScript, I find it helpful to mentally label the type of each variable – for example integer, string, or list of integers. By keeping track of variable types clearly, type errors are essentially eliminated. Sometimes I even explicitly annotate the variable types in code comments (e.g. at the top of qrcodegen.py). Of course this way of thinking is mandatory when working in a statically typed language.

  • JavaScript only has floating-point numbers, not integers (though it has bitwise integer operations). Most other languages distinguish between floats and ints, and observing this distinction in JavaScript is helpful for writing correct, efficient code that can be potentially translated to other languages.

  • Python uses code indentation to denote block structure, so correct indentation is mandatory by design. In other languages it is a very good idea to maintain perfect indentation all the time too, otherwise the code becomes confusing and messy to read.

  • Haskell does not have an implicit notion of a null value, and requires you to use the Maybe type to express a value that can be either present or absent. I find that in Java code, null values are rarely useful, and I frequently write comments or assertions to denote that a value must not be null.

  • Encapsulation isn’t taken very seriously in Python or JavaScript. There are idiomatic ways to differentiate public and private members, but they are not always used by coders. Compare this to C++, C#, and Java, where declaring members as public or private is taken seriously at the outset.

  • Python has duck typing, so inheriting from a blank abstract superclass doesn’t do anything useful. But this design pattern is mandatory in statically typed languages like C++, C#, and Java, and declaring the inheritance is useful in expressing the data type relationships and design intent to a programmer reading the code. An example can be found in my optimizing brainfuck compiler in the Command class.

Avoiding language-specific quirks
  • JavaScript is one of the few popular languages where reading an uninitialized variable or non-existent array index yields a legal undefined value instead of throwing an exception. Relying on this behavior is ugly and poorly translates to other languages. For example, instead of writing a[3] == undefined, you should write a.length <= 3.

  • JavaScript lets you declare a variable multiple times in a function (e.g. var x; var x;), which has the same effect as declaring it just once. In Java this kind of code is a compile-time error. In C/C++ this is an error if the variable is re-declared in the same block, but if declared in a nested block then a new variable shadows the old variable.

  • In JavaScript, setting the value of a non-existent array index (for example a[a.length] = 8;) will automatically grow the array. This is even faster in some JavaScript engine implementations. However I think this code is poor style and translates poorly to other languages; it is much clearer to write a.push(8);.

Getting easy access to features
  • List and dictionary literals: Python, JavaScript
  • Unicode text handling: Python, Java, JavaScript
  • Regular expressions: Java, Python, JavaScript
  • Efficient int and float arithmetic: C, C++, C#, Java
  • Bigint: Java, Python, C#
  • File I/O: Java, Python, C, C++, C#
  • Network sockets: Java, Python
  • JSON: Python, JavaScript
  • ZIP, DEFLATE: Java, Python
  • Easy to compile/run on Unix systems: C, C++, Python
Concepts better explained in another language
  • In Java, the distinction between values and references can be confusing for the beginner programmer. Even as he matures in Java, the two behaviors are likely to be seen as magic. On the other hand if he worked in C or C++, then he is forced at an early stage to understand the distinction between an immediate value/struct versus a pointer to a value/struct.

  • Java is a mature, thought-out, and well-documented object-oriented language with a managed runtime. Python is object-oriented and has a runtime too, but its documentation lacks the depth and rigor that the Java ecosystem has. I find that concurrent programming (threads, locks, condition variables, etc.) and garbage collection (when are objects reclaimed, finalizer functions, weak references, etc.) are inadequately documented in Python, rarely used, and/or rarely discussed. I learned about these concepts largely from the Java ecosystem (from official documentation and third-party articles), and found it helpful to mentally translate the concepts in the Java-oriented articles into the vocabulary of the Python language and libraries.

  • In Python you might be able to argue that putting a value into a dictionary, like d[x] = y is an atomic operation that doesn’t need locking. It is likely to be correct in the CPython interpreter because the dictionary method is implemented as a C function, and C functions cannot be preempted even if they need to execute numerous instructions. But in Java this kind of code is strictly incorrect, and must be surrounded with an appropriate lock and unlock.

Expanding the audience inexpensively
  • Compared to most programmers, I spend a lot of time designing algorithms in addition to solving pragmatic problems. For example I worked on data structures like AVL tree list, broad techniques like Huffman coding, and non-trivial real-world standards like QR Codes. The time that I spend on the math and algorithms is a large one-time cost that is independent of whatever language I choose to implement the ideas in. After I draft my first implementation, I deal with the bugs with regards to the architecture, data flow, computation of correct values, low-level logic, and so on. It is costly too, because it takes deliberate effort to turn a pile of informal mathematical/algorithmic ideas into a concrete implementation in a real, executable programming language.

  • But after a working and debugged first implementation is created, it is smooth sailing from there on out. Adding another language implementation of the same concept is quite cheap compared to all the preparation done upfront to produce the first implementation. My usual approach is to copy and paste the code, and work line by line to fix the syntax and change the library calls. For example when translating from Java to Python, I would replace if (x) with if x:, && with and, etc. My programs are usually runnable or have a runnable test suite, so I can quickly check that the translated version can run properly.

  • By porting my programs to multiple languages, there is a higher chance that a reader understands one of the languages that I published in. This makes it easier to disseminate my ideas to a wider audience.

Gaining insight into an algorithm’s core
  • By expressing an algorithm or data structure in multiple languages, you will discover which parts stay the same and which parts change. Parts that usually stay the same include the control flow, data flow, and data structures (array, dictionary, tree, etc.).

  • Java’s lack of operator overloading makes BigInteger arithmetic look ugly and unreadable. By comparison, bigint arithmetic is very readable in Python and Mathematica due to native support.

  • The explicit static typing in C, C++, C#, Java, etc. are noisy and can hurt readability when glancing at the big picture of an algorithm. But they are useful when drilling down to the details to produce correct runnable code.

  • The explicit memory deallocation in C can be a distraction that makes algorithms harder to understand. This is much less of a problem in C++ due to RAII and containers like std::vector, std::string, smart pointers, etc. This problem is non-existent in garbage-collected languages like Java, JavaScript, Python, etc., because memory deallocation is almost always implicit (except for developers who implement structures things like ArrayList).

  • The act of porting code to another language gives me an excuse to re-read and think about the entire codebase. This often leads to simplifications and other beneficial revisions being made to the code for clarity.


Code examples

The first few interactive examples illustrate how similar Python, JavaScript, Java, and C++ are in their syntax and semantics. When I write code in one of these languages, I can port it to another language by reading and writing one line at a time, fixing up the syntax and changing the library API calls along the way. In particular, I don’t need to change the overall architecture, control flow, or data flow.

Fibonacci function

This example demonstrates function declarations, types, variables, integers, arithmetic, if-statements, for-loops, and throwing exceptions.

Python

# Correct for all non-negative n, thanks to native bigint
def fibonacci(n):
  if n < 0:
    raise ValueError("Negative index")
  a, b = 0, 1
  for i in range(n):
    a, b = b, a + b
  return a

JavaScript

// For simplicity in this illustration,
// we don't worry about large values of n that
// cause rounding errors or numeric overflow
function fibonacci(n) {
  if (n < 0)
    throw "Negative index";
  var a = 0, b = 1;
  for (var i = 0; i < n; i++) {
    var c = a + b;
    a = b;
    b = c;
  }
  return a;
}

Java

// For simplicity in this illustration, we don't worry
// about large values of n that cause numeric overflow
int fibonacci(int n) {
  if (n < 0)
    throw new IllegalArgument("Negative index");
  int a = 0, b = 1;
  for (int i = 0; i < n; i++) {
    int c = a + b;
    a = b;
    b = c;
  }
  return a;
}

C++

// For simplicity in this illustration, we don't worry
// about large values of n that cause numeric overflow
int fibonacci(int n) {
  if (n < 0)
    throw "Negative index";
  int a = 0, b = 1;
  for (int i = 0; i < n; i++) {
    int c = a + b;
    a = b;
    b = c;
  }
  return a;
}

List manipulation

This example illustrates data declarations, the list ADT, and working around missing features with loops and temporary variables.

Python

a = [2, 7, 4]
a.insert(0, 9)
a.append(3)
del a[1]

b = [1, 5, 6]
a.extend(b)
print(a)
# [9, 7, 4, 3, 1, 5, 6]

print(sum(a))  # 35

JavaScript

var a = [2, 7, 4];
a.splice(0, 0, 9);
a.push(3);
a.splice(1, 1);

var b = [1, 5, 6];
a.push.apply(a, b);
console.log(a);
// [9, 7, 4, 3, 1, 5, 6]

var sum = 0;
a.forEach(function(x) {
  sum += x;
});
console.log(sum);  // 35

Java

List<Integer> a = new ArrayList<>();
Collections.addAll(a, 2, 7, 4);
a.add(0, 9);
a.add(3);
a.remove(1);

List<Integer> b = new ArrayList<>();
Collections.addAll(b, 1, 5, 6);
a.addAll(b);
System.out.println(a);
// [9, 7, 4, 3, 1, 5, 6]

int sum = 0;
for (int x : a)
  sum += x;
System.out.println(sum);  // 35

C++

std::vector<int> a{2, 7, 4};
a.insert(a.begin(), 9);
a.push_back(3);
a.erase(a.begin() + 1);

std::vector<int> b{1, 5, 6};
a.insert(a.end(), b.begin(), b.end());

std::cout << "[";
bool head = true;
for (std::vector<int>::iterator
    it(a.begin()); it != a.end(); ++it) {
  if (head) head = false;
  else std::cout << ", ";
  std::cout << *it;
}
std::cout << "]" << std::endl;
// [9, 7, 4, 3, 1, 5, 6]

int sum = 0;
for (auto it(a.begin()); it != a.end(); ++it)
  sum += *it;
std::cout << sum << std::endl;  // 35

Classes and objects

This example shows the declaration of a class, constructors, methods, public and private fields, and working with objects.

Python

class Person(object):
  
  # Constructor
  def __init__(self, nm):
    # Creation of fields
    self.name = nm
    self._salary = 0
  
  # Method
  def get_lowercase_name(self):
    return self.name.lower()
  
  # Method
  def get_salary_multiple(self, x):
    return str(self._salary * x)
  
  # Method
  def raise_salary(self):
    self._salary += 1

def main():
  p = Person("Alex")
  print(p.name)
  p.raise_salary()
  print(p.get_salary_multiple(5))

JavaScript

// Constructor
function Person(nm) {
  // Public field
  this.name = nm;
  // Private variable
  var salary = 0;
  
  // Public method that accesses a private variable
  this.getSalaryMultiple = function(x) {
    return (salary * x).toString();
  };
  
  // Public method that accesses a private variable
  this.raiseSalary = function() {
    salary++;
  };
}

// Shared method for all instances. This approach works
// for functions that only access public fields/methods
Person.prototype.getLowercaseName = function() {
  return this.name.toLowerCase();
};

function main() {
  var p = new Person("Alex");
  console.log(p.name);
  p.raiseSalary();
  console.log(p.getSalaryMultiple(5));
}

Java

public class Person {
  
  // Fields
  public String name;
  private int salary;
  
  // Constructor
  public Person(String nm) {
    name = nm;
    salary = 0;
  }
  
  // Method
  public String getLowercaseName() {
    return name.toLowerCase();
  }
  
  // Method
  public String getSalaryMultiple(int x) {
    return Integer.toString(salary * x);
  }
  
  // Method
  public void raiseSalary() {
    salary++;
  }
}

void main() {
  Person p = new Person("Alex");
  System.out.println(p.name);
  p.raiseSalary();
  System.out.println(p.getSalaryMultiple(5));
}

C++

class Person {
  
  // Fields
public:
  std::string name;
private:
  int salary;
  
  // Constructor
public:
  Person(std::string nm) :
    name(nm),
    salary(0) {}
  
public:
  
  // Method
  std::string getLowercaseName() {
    std::string result(name);
    std::transform(result.begin(), result.end(),
      result.begin(), ::tolower);
    return result;
  }
  
  // Method
  std::string getSalaryMultiple(int x) {
    return std::to_string(salary * x);
  }
  
  // Method
  void raiseSalary() {
    salary++;
  }
};

void main() {
  Person p("Alex");
  std::cout << p.name << std::endl;
  p.raiseSalary();
  std::cout << p.getSalaryMultiple(5) << std::endl;
}

Libraries published on my other pages

Among my published articles, there are many examples where I implement an algorithm, data structure, or application in multiple languages. These examples show how I map a real-world problem into code in different languages, and how the resulting pieces of code compare in terms of clarity, conciseness, and explicitness.

Next lexicographic permutation algorithm
  • ~25 lines of code per implementation
  • Featuring arrays, loops, integers, generic types
Free small FFT in multiple languages
  • ~200 lines of code per implementation
  • Featuring floating-point types and arithmetic, trigonometry, bitwise operations, memory allocation, arrays
  • Python has a huge advantage in conciseness due to native support for complex numbers
Forcing a file’s CRC to any value
  • ~200 lines of code per implementation
  • Featuring file I/O, string parsing, exception handling, bitwise integer math
  • The C code looks obtuse in string handling and explicit error handling
AVL tree list
  • ~400 lines of code per implementation
  • Featuring tree graphs, recursion, reference manipulation, ADT design, encapsulation
  • The data structure is difficult and the algorithm deals with subtle details. To produce the first working implementation, I spent a lot of time on understanding how the theoretical algorithm works and on debugging my code. When I ported the code to other languages, these one-time costs did not apply.
QR Code generator library
  • ~1000 lines of code per implementation
  • Featuring object-oriented design, module/component arrangement, bitwise integer math, arrays
  • There was a large upfront cost in reading and understanding the QR Code specification, along with producing the first working implementation
Overview of Project Nayuki software licenses
  • A list of all code I published on my web site, which includes the topic/title and the list of programming languages
  • About 40% of my pages have multi-lingual program code; the remaining 60% are mono-lingual
  • In decreasing order of popularity, I use Java (most often), JavaScript, Python, C, and so on.

My opinions on specific languages

Disclaimer: These are personal opinions based on my needs, skills, and weaknesses. There is no single right answer or universal truth about a language or feature being good or bad.

Java

A general-purpose object-oriented language that is safe, relatively concise, fast enough, and having a decent standard library. I like the static typing (easy refactoring and saves me from so many trivial mistakes), well-defined language and library semantics (predictable behavior and good documentation), platform independence (no need to deal with endianness, bit width, and platform tooling issues), decent speed for low-level numeric processing, Unicode text support, file and network I/O APIs, and competent multithreading facilities (built-in locks and monitors, memory model, concurrency libraries). Java is my default/preferred language for many projects for these reasons; also I enjoy using the powerful Eclipse IDE. When I write a program intended to be ported to multiple languages, I almost always start with the Java version first; after it is fully functional and debugged then I translate it to other languages (such as adding header files and prototypes for C++, or stripping out types for JavaScript and Python). Things I dislike about Java include the fact that it is too heavyweight for quick throwaway scripts, it has a bunch of little quirks like signed bytes and type erasure, it doesn’t support ad hoc list and dictionary data structures like Python and JavaScript do, and people write bloated “enterprisy” code in it. On a final note, I’m not convinced that JVM languages like Scala, Groovy, and Clojure are worth the trouble compared to plain old Java.

Python

A lightweight object-oriented language that is concise, powerful, relatively safe, slow, and possessing a rich standard library. I like that it lives up to its reputation of being executable (and readable) pseudocode, its enforcement of proper code indentation, its practical and feature-packed standard library, the fact that you can declare lists and dictionaries in the code, the power of list comprehensions and generators, and its general lightweight feeling when reading/writing code. Python is my preferred language for writing short scripts and making small pieces of code to prove a point in a concretely implemented way. I don’t find it appropriate for large projects with many modules and data types. I dislike the Python 2 vs. 3 transition mess (note that I usually write 2/3 polyglot code to appeal to both crowds), how some libraries changed names and features in a subtle way (unlike the stability of Java libraries), how some changes were made gradually and functions added gradually (instead of all at once like in Java), the poor protection against simple typos and data type errors due to the dynamic typing, and the slow numerical performance (about 10× to 100× slower than Java, which is in turn slower than C; this is painfully apparent in my Project Euler solutions benchmark timings).

JavaScript

The lingua franca of web programming, JavaScript is a weakly typed, object-oriented, moderate speed language with only a basic standard library. It is superficially similar to Java in syntax and a couple of libraries (such as Math, String, Date), but is quite different in how values, objects, and functions work. The two main things I like about JavaScript are that it enables me to use HTML web pages as a powerful input and output medium, and that its syntax and semantics are reasonably sane (unlike say C++ or PHP). I also like how functions are first-class values, how easy it is to declare a function in an expression, and how smart people created JIT compilers that made JavaScript much faster than similar dynamic languages like Python and Ruby (JavaScript is about 5× slower than Java in arithmetic; Python is about 30× slower). I avoid problems related to JavaScript’s weak typing by thinking about and writing code in the mindset of static typing – in other words, I consider each variable to have one type (such as integer) and avoid doing potentially nonsensical operations that involve mixed types (such as integer < string). I dislike how I/O and event-handling code involves writing numerous function callbacks, sometimes even deeply nested callbacks. Writing vanilla JavaScript is good enough for me; I don’t care to use the popular libraries like jQuery, Angular, React, etc., or safer / more expressive languages like TypeScript that can be compiled down to JavaScript.

C

A straightforward procedural language that is unsafe, well-supported on Unix, good for computationally intensive arithmetic and algorithms, and good for system programming. I like the raw speed and heavy compiler optimizations (especially for numerical algorithms), the Unix friendliness (a C compiler is usually available out of the box), and the clean interfacing with assembly code. I don’t like the poor support on Windows (need to install some distribution of GCC or the massive Microsoft Visual C++ toolset), the undefined behavior (makes arithmetic and so many operations downright treacherous), the bare-bones standard library, the variably sized integer types, the inconvenience of maintaining header files, and the manual memory management (lacking automatic garbage collection).

C++

A baroque object-oriented programming language that is unsafe, good for applications that need a better way to organize data structures and methods, and a good upgrade path for C programs that get too big. I like it for applications where I could use C but where I need more structure and modularity, such as when I need to define numerous custom data structures and methods on them. All the things that I like and dislike about C also apply to C++; additionally I dislike how slow C++ compilation times are, how C++ fixes none of the problems of C, and how C++ adds many new facilities that are near-duplicates of existing C features.

Haskell

A pure functional programming language with a strong emphasis on expressing algorithms in a compact and mathematically pure way. But the high level of abstraction in Haskell makes me constantly feel like I need a PhD in math to read other people’s Haskell code, utilize libraries, or to write any working code at all. I like Haskell’s static typing, algebraic data types, and rich standard library. I dislike how terse and opaque the code is when multiple higher order functions are used, how I can’t reason about memory usage in long computational chains due to the lazy evaluation, how writing I/O and imperative code needs the programmer to jump through extra hoops, how unclear the mapping between language constructs and low-level machine operations are (thus frustrating the analysis of execution time and memory usage), how nothing seems to be straightforward and there is always a new theoretical concept to learn (e.g. continuation-passing style, type classes, infinite data structures, currying, string-appending functions, etc.).

C#

A rich, safe, object-oriented programming language that is good for application programming on Windows. I have spent little time programming in C# and have barely seen the possibilities offered by the language (such as LINQ and GUI design). Due to C#’s blatant similarity to Java (I would say Java SE 1.4 is similar to C# 1.0), I compare its features and behaviors heavily to Java. And because I treat Java as my primary programming language, I try to stay within my comfort zone and write in the subset of C# that is shared with Java (e.g. by avoiding things like delegates and structs). Overall I have a slight negative opinion of C# because it seems to start with Java’s design as a safe and simple C++-like OOP language, and then add tons of features to it that are hard to keep track of (e.g. properties, operator overloading, unsigned types, checked arithmetic, extension methods, partial classes, LINQ). Over the years, it has become clear that Java has taken a much more conservative direction than C# to language extension – it tries to add new features in such a way that they can be easily mapped (de-sugared) into older features. Also I dislike the MSDN documentation for the standard library (i.e. the .NET Framework) because I find that the wording, presentation formatting, and code samples are much less helpful than the Java SE documentation. If I grew up in different circumstances and C# was my main programming language, I probably would know all its features and use them to make my code powerful and concise. But because I care about other programming languages too, I find many of C#’s features to be distractions that increase my cognitive load without bringing sufficient benefits. But all of this only reflects my opinion of C# as a beginner and can’t be taken as a properly informed opinion.

As an actual example of where I weigh programming languages for a specific application, see my Wikipedia PageRank article. There I discuss how the project would be expected to fare in Java, Python, and C++. My choice reflected the best balance between speed, safety, and library convenience for the application at hand.

Personal history of languages

I first learned computer programming more than 10 years ago. Very slowly over the years, I learned and got comfortable with more and more programming languages. The years and grades given in the list below are accurate to about ±1 year, because only in retrospect did I realize how my habits were changing:

More info