Pete Hodgson

Software Delivery Consultant

Primitive Obsession obsessions

February 22, 2010

After seeing the Primitive Obsession code smell crop up in a few different places recently I got to thinking about it in the context of static and dynamic languages.

Essentially this code smell refers to using a primitive data type (int, string, etc), or a cluster thereof, where you could instead be using an explicitly defined type. Probably the most common variant of Primitive Obsession would be a method like:

def is_point_within_bounds( x, y )
# ...
end

I would call this variant Primitives Travelling in a Pack. You'll probably see these two x and y values hanging out together all over the place. Often this variant will be addressed with the Introduce Parameter Object refactoring, leading to something like:

class Point < Struct.new(:x,:y)
end

def is_point_within_bounds( point )
# ...
end

This is a pretty uncontroversial refactoring. It tends to make the code easier to read, and is often the initial stage in the 'budding of' of a full-fledged class (I believe I have the wonderful GOOS book to thank for that term). Once the class is there, it often becomes a locus for a little cluster of methods, often pulled in from surrounding client code. This is a powerful way of letting your design evolve based on needs.

There are other more subtle variants of Primitive Obsession. Some people would say that two methods like:

public class Rental {
  void increaseRentalPeriod( int numWeeks ) {
   //...
  } 

  int getRentalPeriod() {
   //...
  }
} 

would be better refactored to:

public class WeekSpan {
    public int numWeeks;
}

public class Rental {
  void increaseRentalPeriod( WeekSpan period ) {
   //...
  } 

  WeekSpan getRentalPeriod() {
   //...
  }
} 

Applying this refactoring can again be the first step in budding off a new class. However I would say that the main advantage of this refactoring is the explicit typing which it adds when in a statically typed language. Before refactoring there was an implicit typing going on that said "ints in this context actually represent a time period, in weeks". After the refactoring that typing has been made explicit. One outcome of that is that type errors which were previously implicit are now explicit, and will be caught by the type system. Before the refactoring you could have written client code like

someRental.increaseRentalPeriod( numDaysToIncreaseRentalBy() );

and your compiler wouldn't utter a squeak, despite the fact that you're passing in a number of days, rather than the number of weeks which the method expects. With the explicit WeekSpan type, that error wouldn't sneak through.

This explicit typing also helps with future refactoring. If we later decided that it would be better to represent the rental period using a number of days rather than a number of weeks we could use a automated refactoring tool to trivially replace all instances of WeekSpan with DaySpans (or we could just refactor WeekSpan itself into a generic TimePeriod). If we didn't have the explicit WeekSpan class then we'd be reduced to manually searching for all the instances of the primitive int type which happened to actually be representing a time period in weeks.

This is all good stuff, and makes an awful lot of sense in a static language. However, I personally am not convinced that introducing these single-field data types makes as much immediate sense in a dynamically typed language. Both of the typing advantages are lost in a dynamic language. We don't know the type of anything until runtime so we can't check for type violations (however our unit tests help with that), and our refactoring tools can't take advantage of static type information.

The other advantages of introducing these non-primitive types remain, but I would err towards the YAGNI principle. Leave the primitives where they are, and make a mental note that they might need some attention in the future. Once you see a need to move to a next step in the budding off process (e.g. moving a method into the nascent data type), then that's the time to apply the refactoring. Doing so just for the sake of having an explicit type smacks of premature optimization to me.

Postscript

It's interesting to note that some languages (such as Google's Go) allow you to declare an explicit type which simply aliases a primitive. For example in Go you could write

type WeekSpan uint;

This might be the best of both worlds. You get the explicit typing and attendant type safety, without the potential overkill of creating an new single-field class. If you later discovered that budding off a new class was required it would be trivial to refactor that semi-primitive WeekSpan type into a full fledged WeekSpan class.