Thursday, October 18, 2007

The ever-expanding DSL syndrome

A DSL is a Domain Specific Language, a language designed for a particular application domain. You're likely to know already a few dozen DSLs : think of HTML, CSS and JavaFX (web), SQL (database), UML (modeling), COBOL and PL/1 (financial application), Fortran and Matlab (scientific applications), AutoCAD (3D design), Postscript (page design). Some more specialized DSLs are known only within their community : MLFi (finance), OPL, AMPL and GAMS (optimization), Lex and Yacc (parsing). Today there exist thousands of DSLs, and you could not imagine developing a software application without them. DSLs are important because they allow you to express quite directly the concepts you have in mind.

Most DSLs are subject to what I call the ever-expanding DSL syndrome : they come into existence as“small” languages specifically designed for expressing concepts specific to an application domain (they are often designed in-house by the domain experts, not by language specialists). But DSL users soon feel the need to express arbitrary expressions, to have access to more and more library functions, to access databases and web browsers, to handle programming-in-the-large via e.g. powerful type systems and modularization, to have better tools support, and so on.

As a result, DSL users are always waiting for a new feature, while DSL developers try to catch on by expanding the language and/or adding tool support, getting engaged in a never-ending spiral, in effect developing a general purpose language with a full-blown programming environment. This resulting language is obviously incompatible with anything existing, and is quite often plagued with quality problems and poor tooling support.

As an example, let us look at the new features advertised for existing modeling languages. The following excerpts are taken from the respective manufacturer's web sites : you will note that all the features mentioned here are unrelated to the domain of modeling, and are already present in mainstream general-purpose languages.


  • AIMMS :

    • Web services

  • AMPL :

    • Character strings
    • Database access
    • Looping and testing (writing "scripts")
    • Reporting and display

  • GAMS :

    • Conditional statements

  • OPL :

    • Connection with spreadsheets and relational databases
    • Scripting
    • Interactive development environment
    • External function calls

On the contrary, Ateji believes in designing DSLs that are "large" languages, namely DSLs designed as extensions of mainstream generalist languages. Rather than starting with a few domain-specific concepts and progressively adding additional features, we start with a full-featured general-purpose language and add domain-specific concepts.

The difference is striking : you will never complain again that your DSL doesn't allow you to add 1+1 (think CSS), since it already has all the features of a large programming language. Additional benefits are integration at the language level (DSL code and application code work on the same objects) and availability of state-of-the-art development environment and tools.

DSLs as extensions of mainstream languages also have a very fast learning curve. If you know the mainstream language, you'll only have to learn a few additional concepts. If you're a domain expert, you'll find a familiar language expressing the concepts of your domain. In both cases, you won't need to learn yet another different way of writing 1+1.

Thursday, October 11, 2007

Preparing for Seattle

Ateji will be holding a booth at the Informs conference in Seattle, Nov. 4th to 8th. This is our first booth abroad, we'll be introducing our OptimJ language to the operational research community.

When you discover all the work this implies, developing software looks easy in retrospect. Of course, we first made sure the product works fine by running a large-scale beta-test program. Software developpers can still handle this.

But then you need to prepare brochures (how many ?), brush up your english (can you ripit plize ?), prepare your speech for the plenary session (sorry everybody, I promise, my demo used to work until 5min ago), make reservations for chairs and tables (apparently cheaper to buy and throw away than rent for 4 days), understand the union regulations (it is strictly forbidden to carry yourself your own luggage between the entrance door and the booth, but the 10,000kms before reaching the entrance door are ok), think about the booth décor, print posters, try to find an insurance company for civil liability (french companies simply don't want to ensure an event abroad -- I'm still looking, in case you know someone who can help), set up appointments with the press, and most important, research what kind of french sweets would be most successful in attracting prospective customers around our booth (provided the customs don't consider them as potentially lethal).

I used to work as a researcher, and even published a few involved theorems. A breeze. I managed the transition to becoming an engineer and now an entrepreneur. Cool. But setting up a booth in a conference is about to knock me down.

Well, I'll sleep in the plane. See you in Seattle !



Sunday, October 7, 2007

Code generators

I have often been asked "After all, what you provide is a code generator ?". Well, yes and no.

Yes, because every compiler is a code generator. Whether you generate assembly code, virtual machine instructions or source code doesn't make much a difference, as far as execution of the program is concerned.

No, because the words "code generator" convey the idea that the "true" source code is the generated code, not the one you wrote. If you have ever played with a code generator, you have certainly noticed how little support there is for your original source code. You have probably felt the need to patch the generated code, and you have probably complained about the lack of tool support (think about debugging) at the original source level.

At Ateji we're indeed generating source code, for one specific reason : generating source code enables the reuse of all legacy software engineering tools and techniques available in the Java ecosystem. It would actually have been easier to directy generate byte code for the JVM. But you never actually see the generated code : our languages extend Java or other mainstream general-purpose languages, so you won't ever need to patch the generated code. Our languages are integrated at the IDE level, so that you always work directly with the original source code that you wrote.

This is why we never use the word "code generator" when referring to our products : the generated source code does indeed exist, but is only an engineering artefact.

Friday, October 5, 2007

What's in a name


Choosing Ateji®  ("ah-teh-gee") as a company name came quite naturally : it relates to my personal experience (I used to live in Japan, and even spent some time teaching japanese language), and it reflects quite well the goal we are trying to achieve.

An ateji is a japanese technique for associating ideographics characters with words (http://en.wikipedia.org/wiki/Ateji). A computer scientist would say associating syntax with semantics.

When the japanese began importing chinese characters, they had basically two choices : import the chinese reading (more precisely a japanized version of the chinese pronunciation) together with the characters, or use the chinese characters to denote the existing japanese words with their existing pronunciation. Both versions are still common today.

But the two languages do not always agree on what is a word. 'Otona' is the original japanese word for adult, written with the two chinese characters 'Big' + 'Person' : there is no way to cut 'otona' in two pieces in order to account for the two characters. This is the typical example of an ateji. Another example of an ateji is a kind of rebus, where unrelated characters are used on purpose to introduce some additional nuance. 'Kurabu', written with the kanjis 'Ku' (together), 'Ra' (fun) and 'Bu' (group), is a word created at the end of the 19th century to convey the meaning of 'club' while preserving a sound close to the original english pronunciation.

As you see, bringing together sound (syntax) and meaning (semantics) can be quite tricky, but also can provide deep insight when they are cleverly designed. This is precisely what we are trying to do at Ateji® : design languages where you can express what you need to express, bringing the important semantic concepts at the language level, while making sure syntax doesn't go in the way.