Mad Analogy: development

Showing posts with label development. Show all posts

21 Jun 2012

Rails test coverage: sometimes 100% is just right

DHH, the éminence grise of the Ruby on Rails world, took a swipe at the test-first cult with his provocative article "Testing like the TSA", saying in effect that 100% test coverage is mad, bad, crazy behaviour, worthless, and an overall affront to good taste and a crime against humanity. [I paraphrase.] Since we enforce 100% code coverage at all points through our development process, I want to explain how this does not necessarily make us time-wasting, genital-fondling idiots, how the needs of our business drive our quality strategy, and how this pays off for us.

At Sage our customers demand and deserve the best we can deliver. We are very quality focused because we build accounting solutions in which getting the right answer matters a great deal: perhaps some customers don't care about quality, but ours demonstrably do. Perhaps in some cases time-to-market is much more important than reliability or maintainability: it is a business decision, and there is no one-size-fits-all answer. However, if you're building for the future and want to avoid years of functional paralysis and a costly rewrite, building an application on a solid quality foundation makes a lot of economic sense.

Write less code

The most effective way to maintain 100% test coverage is by writing less code. We refactor like crazy, and we refactor our tests just as much as our code. We don't repeat ourselves. We spend time creating the right abstractions and evolving them. Having 100% test coverage makes it much easier for us to do this: it is a virtuous cycle.

We've been doing Rails development at Sage for five years now, and we've learned a few lessons. Even if you're writing unit tests with 100% code coverage, you're doing it wrong if:

Generators are used to build untested code (i.e. using the default Rails scaffolds to build controllers and views)
Partials are the most sophisticated method of generating views, and they look like PHP or ASP
The tests are harder to understand than the code

What is the alternative? Well, if all of the controllers and views look pretty much the same, factor them out. The Rails generators create enormous amounts of crappy, unmaintainable boilerplate code – every bit as as much as a Visual Studio wizard. On the other hand, if the controllers and views are each completely different and unique flowers, is it for a good reason or is the code just a mess? Chances are, if the code looks like a mess, so does the app.

In my experience it's also basically useless to attempt to retrofit unit test code coverage onto a project that doesn't have it: the tests that wind up written are always written to pass, and they rarely help much. I haven't yet seen a project that could be rescued from this situation.

Whom do you trust?

When DHH says that the use of ActiveRecord associations, validations, and scopes (basic Rails infrastructure) shouldn't be tested, he's claiming that Rails is never wrong: not now, not in the future, not ever. It's his choice to make that promise, but it would be irresponsible of us to believe it:

Rails changes all of the time. Sometimes there are even bugs! (Crazy talk, I know!) But active record associations and scopes are complex and ornery, and can easily be broken indirectly (through a change elsewhere in the code).
Because we operate on the Internet, new security risks and fixes appear constantly: zero day attacks are real. We need to react to these threats quickly, and being able to prepare and deploy new versions of our apps based on updated components immediately is crucial. Having a robust test suite makes it much cheaper and less stressful to implement these changes, which drives down technical debt and makes development more responsive, and oh yeah, helps prevent a costly rewrite.
We use components that extend and complement the behaviour of Rails. DHH calls out the example of testing validations to be particularly useless. Well, what about when the validations methods change in a rails upgrade? Or you want to adopt a new plugin that changes core Rails behaviour? Or you want to refactor an application to move validation to a more useful place? In all of those cases the tests on validation code would be useful.

Often this means a function in a spec mirroring a function in a model (but with enough difference in naming and syntax to be truly maddening). Yes, this feels stupid sometimes, but it is a very cheap insurance policy, and sometimes it pays off.

Time split

DHH says that you shouldn't be spending more than 1/3 of your time writing tests. This leads to a question: how are you characterizing your time? Is the person doing the implementation also the person making design decisions? If you are doing behaviour-driven development you are actually vetting the requirements at the time you write the tests, so is it a good idea to skip that part and move on to the coding? If you spend time refactoring tests to speed up the test process, should that be counted? Should the time spent writing tests before fixing bugs be counted? Have you decided to outsource quality to a bunch of manual testers? What is your deployment model? I'm reluctant to put a cap on the time writing tests. I find this metric as useful as dictating the time spent typing vs. reading, or the amount of time thinking vs. talking: my answer is not yours, and the end result is what matters.

Risk assessment

We enforce 100% test coverage because it ensures that no important line of code goes completely untested. One can decide to write tests for "important" code and ignore the "unimportant" code, but unfortunately a line of code only becomes "important" after it has failed and caused a major outage and data loss. Oops!

DHH avers that the criticality and likelihood of a mistake should be considered before deciding to write a test about something. However, this ignores the third criteria: cost. Is it cheaper to spend time deciding the criticality and likelihood of writing vs ignoring tests for every single line of code, or is cheaper to just write the stupid test and be done with it? Given the cost of doing a detailed long-term risk analysis on every line of code, does anybody ever really do it, or is the entire argument just an elaborate cop-out? The answer gets a lot clearer once you elect to write a lot less code, and it gets easier once you resign yourself to learning a new skill and changing your behaviour.

Closing

Code coverage is a great way to measure the amount of exposure you have to future changes, and depending on your business, it might be necessary to have 100% coverage. A highly respected figure speaking ex cathedra can be very wrong when it comes to the choices you need to make, and sometimes it shows. 100% code coverage may seem like an impossible goal, especially if you've never seen it done. I'm here to tell you it's not impossible: it's how we work, and in our case it makes a lot of sense.

13 Jun 2012

Rails i18n translations in Yaml: translation tool support

With Rails 2.2 the i18n API was introduced with a new method for translations. Instead of embracing the venerable gettext which had been the previous standard, the Rails team invented a new way using Yaml files. The result is a particularly graceful, flexible and very Rubylike way of specifying translations. It also is much more reliable than gettext, which had many inscrutable issues with locales and caching, and sometimes caused people to get things in the wrong language. So: bravo, great job.

But to do this, they specified their own translation format, the very flexible Yaml file. There are already a lot of formats floating around, and translation tool vendors and open-source translation developers have been working for a long time on conversion tools between them. The Translate Toolkit and Pootle emerged from South Africa (a country which ~~groans beneath the weight~~ revels in the glory of eleven official languages) which provide an excellent web-based tool for collaboration, centered around gettext PO files. However, poor little Pootle started a migration from Python to Django, and we all know how rewrites go. [Halfway. Badly.] But Translate Toolkit supported a lot of formats:

moz2po - Mozilla .properties and .dtd converter. Works with Firefox and Thunderbird
oo2po - OpenOffice.org SDF converter (See also oo2xliff).
odf2xliff - Convert OpenDocument (ODF) documents to XLIFF and vice-versa.
prop2po - Java property file (.properties) converter
php2po - PHP localisable string arrays converter.
sub2po - Converter for various subtitle files
txt2po - Plain text to PO converter
po2wordfast - Wordfast Translation Memory converter
po2tmx - TMX (Translation Memory Exchange) converter
pot2po - initialise PO Template files for translation
csv2po - Comma Separated Value (CSV) converter. Useful for doing translations using a spreadsheet.
csv2tbx - Create TBX (TermBase eXchange) files from Comma Separated Value (CSV) files
html2po - HTML converter
ical2po - iCalendar file converter
ini2po - Windows INI file converter
json2po - JSON file converter
web2py2po - web2py translation to PO converter
rc2po - Windows Resource .rc (C++ Resource Compiler) converter
symb2po - Symbian-style translation to PO converter
tiki2po - TikiWiki language.php converter
ts2po - Qt Linguist .ts converter
xliff2po - XLIFF (XML Localisation Interchange File Format) converter

In its heels, Google introduced the Google Translate Toolkit, which lets you use the Google Translate engine to suggest translations (based on its own databases or translation memories you provide). It also does the core of what Pootle does: collaboration, access, but without crashing and flakiness, and it works with:

AdWords Editor Archive (.aea)
Android Resource (.xml)
Application Resource Bundle (.arb)
Chrome Extension (.json)
GNU gettext (.po)
HTML (.html)
Microsoft Word (.doc)
OpenDocument Text (.odt)
Plain Text (.txt)
Rich Text (.rtf)
SubRip (.srt)
SubViewer (.sub)

But neither of them supports Yaml files. Unfortunately, tooling support libraries have not embraced this format in the intervening two and a half years. I did find one solution: i18n-translators-tools which supports conversion between Yaml and gettext PO files, but it's somewhat idiosyncratic, and it turns out there's a good reason why there isn't a straightforward Yaml ←→ PO converter: the PO format is consists of name-value pairs with metadata, and the Yaml format is a tree.

English source Yaml file	Spanish Yaml file produced by i18n-translators-tools from a PO file
page_info: sales/credit_notes: date: "Date" title: default: "Sales Credit Note" new: "New Sales Credit Note"	page_info: sales/credit_notes: date: "Fecha" title: default: default: "Sales Credit Note" translation: "Crédito de venta" new: default: "New Sales Credit Note" translation: "New Sales Credit Note"

English source Yaml file

Spanish Yaml file produced by i18n-translators-tools from a PO file

page_info:

  sales/credit_notes:

    date: "Date"

    title:

      default: "Sales Credit Note"

      new: "New Sales Credit Note"

page_info:
  sales/credit_notes:
    date: "Fecha"
    title:
      default:
        default: "Sales Credit Note"
        translation: "Crédito de venta"
      new:
        default: "New Sales Credit Note"
        translation: "New Sales Credit Note"

There are some interesting things going on here: the Spanish Yaml file provides fallbacks so untranslated strings don't come through as blank. The intermediate gettext PO file keeps the tree structure in the msgctxt metadata, and looks like this:

msgctxt "page_info.fuji_sales/sales_credit_notes.title.default"
msgid "Sales Credit Note"
msgstr "Crédito de venta"

msgctxt "page_info.fuji_sales/sales_credit_notes.title.new"
msgid "New Sales Credit Note"
msgstr "New Sales Credit Note"

So it's possible to use Google Translate Toolkit to translate your Rails Yaml files, provided you use the i18n-translators-tools library to do the conversions, and configure your Rails applications to support fallbacks.

17 Apr 2012

Homogeneous web development: Meteor, Derby, Firebase and the portents of doom

A variety of new web frameworks are being cooked up that allow you to write one set of seamless code for the client and server. It's a problem that has haunted the web development community since the dawn of JavaScript and the DOM. One approach is to basically define the database operations on the client. Does that sound like a good idea, or does that sound like a great idea?

Meteor

Exposes the MongoDB API directly on the client to work on automatically-synced data subsets. What could possibly go wrong? Let's name the project after a flaming ball of rock and find out for sure!

Derby

Is client-side MVC too confusing? Is Node.js too immature? Let's combine them and see what happens! (It remains to be seen whether Derby is named after a hipster hat or a county fair event.)

Firebase

"We have a full security system in the works that will allow you to control read and write access on individual locations in Firebase on a per-user basis. However, it’s not ready for widespread use yet, so right now all data in Firebase is publicly accessible. Please keep this in mind when building apps! Please contact us if you need security or want to be one of the first to try out the new system." ^*

Despite my scornful tone, I'm actually very optimistic on these technologies and very hopeful that at least one of these will be ultimately successful. I'm also really happy that I'm not going to be the first person trying build an application on this stuff. Given the theme of the project names, it's fair to say that most early adopters will get burned.

* Yes, that's a direct quote.

8 Nov 2011

AGPL revisited: how MongoDB licensing differs from MySQL

Now that the Affero General Public License (AGPL3) is actually being used by successful projects, I'm looking at it again. Specifically, MongoDB is AGPL3 licensed, and it is being used for commercial applications. But how?!? I though the AGPL was complete communism, and that's what excited me so much about it - one touch of the the brush, and the whole batch of milk is stained vermillion, and your entire enterprise now belongs to Richard Stallman so he can use it to fund GNU HURD.

The AGPL actually has some pretty fixed boundaries:

A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.

Upon reflection, the AGPL isn't as restrictive as I once thought. Let's take what I consider to be the most successful GPL (v2) product: MySQL*, and consider what would have happened if it had been released under AGPL instead. Since Amazon used MySQL code to build RDS, under the AGPL Amazon would be forced to release the code they use to provide the RDS service. They would not be forced to release the code for Amazon.com** however: that would clearly be outside the boundaries set out in AGPL.

Also consider that Facebook uses MySQL internally, with something like 4000 MySQL databases to power much of their site, and they've made many changes to MySQL in order to make that possible, some of which they've made public. If MySQL had been AGPL-licensed, they would have been required to make those changes publicly available under the same license.

Google is also reportedly one of the largest users of MySQL, and in a similar spirit they have released some of their tools. However, they released these tools under the more permissive Apache 2.0 license: if MySQL had been released under AGPL3, Google would most likely have been forced to release these tools under AGPL3 as well.*** And now that Google is also offering Google Cloud SQL made with GPL-based MySQL, they don't have to share their work as they would if MySQL were AGPL3-based.

All of this to say: if you want to use MongoDB to power a web app, have fun: the boundaries within the AGPL3 are there to help you, and probably won't require you to hand over your code to every visitor. However, if you see MongoDB and think "hey, that's cool, I'm going to offer a web service with the MongoDB API and become a cloud provider of NoSQL data storage, just like Amazon SimpleDB" then you will have made a derivative work, and you'll have to share those changes with the world under AGPL3.

Finally, IANAL, not in any jurisdiction, and if you base your legal strategy on lay analyses found on personal blogs, then sadly you're not alone and you're in very risky company. Best of luck, however, in finding a copyright attorney who will dig through these issues for you and give you an opinion for less than $500k.

* The Linux kernel is more widely used than MySQL, but it's so mixed up with other licences that it can't just be GPL anymore, not honestly - and the copyrights are owned by so many different people that nobody can claim ownership. MySQL, on the other hand, was always extremely diligent about maintaining ownership of every line of code they include in their distribution (which made acquisition by Sun and Oracle all the more attractive).
** ... that is, provided Amazon.com was built using MySQL, which it isn't AFAIK.
*** They could still licence their code any other way they want, as they own it, but they'd be required to license it under AGPL3.

26 Nov 2010

OSX as a Ruby on Rails dev environment: Package Managers

A lot of Rails developers like to use OSX as their development platform. Although everybody hosts Rails apps on Linux (or Solaris under duress) lots of people love OSX for its productivity, clean interface, and most importantly, its typography.

However, as some have noted, setting up Rails on a mac is hardly a frictionless process. Unlike Linux distros, OSX has no built-in package manager; you get your version of OSX and you get your patches and you'd better like what you get, because every app is going to be updated when Apple or the vendor feel like updating it. This is the same as the Windows world, and it's ugly.

So a couple of efforts have stepped in to fill this void: MacPorts and Homebrew. Neither of these is going to feel like a complete solution if you're used to a package manager like APT or YUM, but they do at least automate the installation process for various open source packages. After all, when you want wget there's no reason you should have to find the website.

I'll start with MacPorts since that came first. MacPorts was inspired by BSD Ports; it is built in Tcl and C and contains a very complete set of available packages. It is quite popular and is the venerable incumbent. And personally, I hate it. I've had my OSX install ruined twice while using MacPorts, just by installing system updates; although I obviously did something wrong, it just isn't a robust solution. If MacPorts is the solution, I don't want to hear the question.

Another alternative is Homebrew, new Ruby-based system developed on Github. It has been around for less than two years, and it's a very active project with a lot of contributors. It stresses extensibility, and lots of recipes have been written to support various packages - predictably, those most popular with Rails developers. Although I don't think it solves the brittleness problem MacPorts suffers (it doesn't address operating system component and library version dependency issues) it is very actively developed, focused on the Rails world, and easily customizable to meet individual needs.

So, although you're probably not going to get set up with a Rails development environment with OSX as quickly as you would on Ubuntu (despite Ruby being included in Xcode), there are good solutions to keep you from pulling your hair all the way out. Which will bring you to the point where you can enjoy and appreciate the kerning on the fonts in TextMate as you write your Rails code.

7 Sept 2010

Whether the test or the code is more important depends solely on who has to write the tests

In the world of open source development, automated tests are like gold. They're the glue that makes it easy to maintain projects with hundreds of collaborators. When they don't exist, code dies, and nobody knows about it - that would be a bad thing, so preventing it is job one. Unless of course, it means you actually have to write tests for your code, in which case it's delegated just as far down the food chain as possible. And nothing's further down the food chain than a paying customer who's already paid you.

Let's say you paid the author/owner of an open source project to add support for something you need. Let's just say that it's something you need, but that would be useful to him/her as well as others. And let's just say that s/he puts that code in his/her distribution. And you pay him/her for his/her time and effort. All is good.

Then, about six months later, you discover a bug in his/her library, in the very code that you paid him/her to write. You fix the code, and that's a good thing, because you really need it to work. It really should have worked in the first place, but oh well. Shit happens, right?

So then let's say that the code is all hosted on Github [because this hypothetical case happened in 2009/2010 and anything worthwhile is being hosted on Github], that you branched the main project, made your change, and committed it. Then you send a pull request to the project maintainer explaining the situation. Beautiful, this is exactly how open source is supposed to work. Git is wonderful, Github is fantastic, and everything just works because of it.

And you get the answer back from a minion of the author that you paid: "well, no, we can't accept this change, because you see, there is no test that was broken in the first place, and no new test has been written to prove that this change is a good one. So go write a test and then we'll think about it."

Which strikes you as a bit odd, because hypothetically, if s/he wrote the code in the first place, and s/he/they is/are such [a] holy motherfucking test-first code ninja[s], s/he would never write a line of code without tests for it. Except the evidence is in the code that never worked in the first place (despite your having, er, paid for it).

So now you have to maintain your own branch of this stuff in perpetuity, because they have rules, you see, and standards, and these rules and standards say that they won't accept changes that don't fix tests. Oh, and by not accepting your fix they're actually hurting your reputation, because the fact that the stuff they wrote for you doesn't work with their library might look like it's your fault, not theirs. But you paid them, and it's all over now. Unless you want to try to write a test for them, which they'll consider accepting.

What you might expect from the maintainer would be an apology, a gracious acceptance of the fix, and for him/her to write the test s/he should have written in the first place (if that's what makes him/her so goddamn happy).

Purely hypothetically speaking. I mean, it would be totally inappropriate to name names if this actually happened.

11 Jul 2010

Adobe Flash: Just because Steve Jobs says it's bad doesn't mean it's good

Steve Jobs' self-serving Thoughts on Flash were controversial to say the least. Yes, he was hypocritical and self-serving (as usual), but he certainly wasn't wrong.

Adobe wants everyone to treat Flash as if it is an open standard, but they haven't made it open source. They made some parts of it open source, but not the parts that matter - and as a result, developers are constantly left wondering which platforms are going to work.

@cyanogen on Twitter: Flash doesn't work because it uses a native (non-portable) library which uses ARMv7 instructions. It can't run on older processors.

As a friend said, "Apple seems just as evil as Microsoft, just not as
successful. And Jobs seems even more evil than Bill Gates. Certainly
a bigger bastard." I totally question Steve Jobs' motives in wanting to crush Flash, but I don't think Adobe deserves a great deal of sympathy.

18 Jan 2010

Rogers tells HTC Dream users to turn off GPS or 911 calls won't go through

On January 15 I received an SMS message from Rogers telling me I'd better disable GPS on my phone or I wouldn't be able to make 911 calls. This is the latest chapter in the unhappy saga of the HTC Dream on Rogers.

Rogers/Fido service message: URGENT 911 Calls: Please disable GPS location on your HTC Dream device to ensure all 911 calls complete. HTC is urgently working on a software upgrade and we will provide details shortly so you can re-enable GPS.

Instructions: Select Menu - Select Settings - Select Location - Uncheck Enable GPS Satellite

Message de Rogers/Fido : URGENT - Appels 911 : Veuillez désactiver la localisation GPS sur votre appareil HTC Dream afin de vous assurer que tous les appels 911 soient acheminés. HTC développe le plus rapidement possible une mise à jour du logiciel et nous vous fournirons les détails sous peu afin que vous puissiez réactiver la fonction GPS.

Instructions : Sélectionner Menu - Sélectionner Paramètres - Sélectionner Location - Désactiver les satellites GPS

First Rogers announces that they're not providing any more upgrades to the software on this platform. Then they announce that they'll upgrade Dream users to the HTC Magic for free (well, with a contract extension). Then the damn thing just doesn't work. Ah, the joys of early adoption...

I just want an Android device with a keyboard. Is that too much to ask?

4 Nov 2009

The cross-species dinosaur team meets to discuss the ideal design for a mammal

4 Oct 2009

Distributed is the new Object Oriented

In the 80s, Object Oriented development promised a fundamental reshaping of the software development landscape, and it had distinct religious overtones. (You can tell it was religious because Object Oriented is capitalized.) It was going to be better in every way from procedural programming - everything would be reused, bugs would be eliminated, and mass love would result. Like Theravada Buddhism, once you accepted the Four Noble Truths of Encapsulation, Inheritance, Polymorphism, and Modularity everything else followed. This fever gripped the development world for twenty years, and thousands of developers never made the mental shift necessary to embrace it.

Leaders often made the fateful decision to rewrite existing procedural apps in object oriented technologies. Did the resulting programs run better? Um, no. Did they conquer the marketplace? God no. Did they run faster? Hell no. Windows Vista is a prime example; I'm not going to rehash any personal case histories because the pain is still too great. I'll let you know when I'm strong enough to cry.

Distributed development is as different from Object Oriented as Object Oriented is from procedural development. Most of the existing cadre of developers will never get this stuff, just as most procedural developers never figured out OO. Hadoop / MapReduce and Erlang require a rethinking of how problems should be solved, and a rethinking of what problems can be solved. Instead of figuring out how to best rewrite yesterday's apps with today's technologies, it's much better to treat them as solved problems and move on.