fromMarch 2013
Feature:

Object Oriented Programming 101

Or, How I Learned to Stop Worrying and Love the Interface
3

Drupal 8 is in hot development, with this massive release expected [soon]. Among other changes, Drupal 8 represents the first Drupal release to fully embrace the potential of object-oriented programming made possible by more recent versions of PHP. (Drupal 7 merely dipped its toes in the proverbial water.) However, after a decade as a procedural application with procedurally-minded developers, the transition from an all-procedural to a mixed object-oriented/procedural system is likely to be bumpy, especially for developers who are still new to object-oriented code.

While a complete treatment of the entirety of object-oriented programming (OOP) would more than fill this entire magazine, a firm grounding in the concepts and syntax of OOP should fit in just a few pages. Shall we have a go at it?

First, an Aside

There is no one true form of OOP. Many different languages have implemented something they called OOP in vastly different ways, sometimes missing features considered common in other languages. Javascript, for instance, has objects that have little similarity to objects in PHP. For now we are considering only “classic” OOP (that is, those involving classes), and PHP in particular. Most of it would apply to any C-family language as well (C++, Java, C#, etc.).

Data Types

To understand what an object is, let’s first understand what came before it. Consider a string. A string is a data type, a definition of a certain type of data. Certain types of data have operations that may be performed on them, to either change them or get information about them. strlen(), for instance, retrieves the length of a string. For another data type, however, such as integer, that operation doesn’t apply but division does.

A large part of the underlying power of OOP is that it allows you to define your own custom data types. These data types are called classes. A class has some internal structure, but just as the implementation details of strings are not your concern, the implementation details of a class should be irrelevant to someone using it. A class consists of properties and methods. Consider:

class Rect {
  protected $height;
  protected $width;
  public function __construct($height, $width) {
    $this->height = $height;
    $this->width = $width;
  }
  public function getArea() {
    return $this->height * $this->width;
  }
}

This code defines a new class (data type) called "Rect". Rect consists of two properties, or attached variables: $height and $width. It also contains two methods, or attached functions. On its own, it doesn’t do anything. We’ve only declared that Rect exists. To make a new variable that is of that new data type (as opposed to a string or integer), we must instantiate a new object of that type, like so:

$height = 5;
$width = 6;
$myrect = new Rect($height, $width);

$myrect is now an object, or instance, of the data type Rect. It is not the same as Rect. For Drupal developers, the relationship between a class and an object is very analogous to the relationship between a Node Type and an individual Node.

Anatomy of a Class

Methods are operations that can be performed on the object. In the Rect class, we have a method called getArea(). We can call the method by referencing it from the object, like so:

$area = $myrect->getArea();

That invokes the method getArea() on the object $myrect. What that means is that when getArea() is called, the magic variable called $this refers to $myrect. If we had multiple objects of type Rect, calling getArea() on them would refer to their own separate $this.

Why can’t we just reference $myrect->height and $myrect->width and multiply ourselves? Consider the new keywords public and protected. Those define the visibility of those properties and methods. Public values are those that can be used from outside the object (such as getArea()). Protected values may only be used by code inside the class definition, or in a subclass. (More on that in a moment.) A third visibility, private, is only accessible to code inside that class definition, and not subclasses, but is rarely if ever used in Drupal. By default if you don’t specify a visibility then it is public, but it’s best to always define visibility.

Good practice in Drupal is to always declare properties protected, and declare methods as public or protected, as appropriate. Public properties and private anything are discouraged.

Interfaces

Why is visibility important? Keep in mind that objects are not just associative arrays with funny syntax; they are self-contained data types in their own right. The internal details of Rect, like the internal details of a string, are none of our business as a consumer of that object. That is by design. It’s also one of the hardest things for new OO developers to wrap their head around: You only know and care about a small piece of an object, its public bits. You are freed from worrying about its private bits, because they’re hidden from you.

To the outside world, objects of type Rect look like this:

class Rect {
  public function getArea() { /*...*/ }
}

That’s all we know about it, by design. So important is this concept that it has its own syntax, called an interface. See:

interface RectInterface {
  public function getArea();
}

class Rect implements RectInterface {
  public function __construct($height, $width) { /*…*/ }
  public function getArea() { /*...*/ }
  /*...*/
}

Just as a user interface defines how a human interacts with a program, a class interface defines how calling code interacts with an object, through methods. It’s not the object itself, and you cannot create an object of an interface, only of a class. And just as it’s a security hole if a user can get around the user interface and muck with a program directly, calling code is not supposed to be able to access any portion of an object other than the public interface it chooses to expose.

In this case, interface RectInterface declares "here’s this method getArea()", without any implementation. class Rect declares that it implements that interface, meaning "I promise I have a getArea() method." An interface only defines what a method does. A class defines how a method works.

Constructors

There’s actually a second method on Rect, __construct(). That oddly named method begins with two underscores, which is PHP-speak for “there’s language magic here.” In this case, it defines a constructor, which is a special method that is called when an object is first created. Notice above we created a new object by calling new Rect($height, $width). What actually happens is this:

  1. A new object of type Rect is created in memory.
  2. The __construct() method of that object is called, using the parameters in the "new" call. (If there is no __construct() method, this step is skipped.)
  3. That method assigns values to properties of $this, which is the object we just created.
  4. A new variable called $myrect is created. It is not the object, but it is a handle for an object.
  5. The handle $myrect is set to point to the object in memory that we just made.

In practice, most constructors are fairly basic and just map parameters to object properties. It’s bad form for constructors to contain more than the most rudimentary logic. It is also bad form for an interface to define a constructor, since different implementations of the same interface could need to be initialized with different information.

Object Handles and Passing by "Reference"

In the previous section, we said that $myrect is not the object itself, but something called a handle to the object. That’s an important distinction. The object exists separately from the variables that refer to it. That’s not true of primitives (integers, strings, etc.) or arrays. When you pass an object to a function or method, you’re actually passing the handle. The handle, like most PHP variables, passes by value, but the object is not copied. Consider:

$myrect = new Rect(3, 4);
function getVolume($rect, $depth) {
  return $rect->getArea() * $depth;
}

$rect does not have an & prefix, so it is passed by value. That means $myrect and $rect are two different variables. Both, however, are simple handles to the same Rect object in memory. That makes it look, most of the time, like objects pass by reference. PHP core developer Sara Golemon has an excellent article that goes into more detail, but for now just note that when you pass an object to a function, you are manipulating the same object, not a copy.

Inheritance

Often, one data type (class) is simply a slight change from another. That’s very common when both need to share the same interface, but will only differ slightly in implementation. In that case, it is possible to extend a class. See:

class Square extends Rect {
  public function __construct($width) {
    $this->width = $width;
    $this->height = $width;
  }
}

$mysquare = new Square($width);

This declares that Square is a special case of, or subset of, or specialization of, Rect. Everything that you can do to or with Rect applies to Square. In fact, it’s often a syntax error if you break that assumption. Note that we’re not defining the getArea() method. That means that Square will “inherit” the getArea() method from its parent, and calling $mysquare->getArea() will work and execute the exact same code from Rect. However, $this will refer to a Square object. We are not defining $width or $height either, so those are also inherited. If they were declared private in the Rect class, however, we wouldn’t be able to access them here.

A class may implement any number of interfaces, but it may only inherit from one parent class. For that reason, think carefully before having classes inherit from one another as it can limit your options in the future. Usually, you will benefit more from composition, or making one object have another object inside it (passed in through the constructor), as that gives you more flexibility.

Object Thinking

Now that we have the basics and syntax laid out, how do we take advantage of it? Or more to the point, how does one think about objects rather than simple variables and arrays?

Whereas procedural code is based on a series of steps, OOP code is based on the interaction of objects. Each object is a self-contained black box, the implementation details of which we don’t know or care about. We care about setting up how those black boxes will interact with each other, by calling methods on other black boxes. We care about the interface of those objects, not their implementation. The biggest strength of OOP is that it forces us, at a language level, to think of interface and implementation separately. Good OOP encourages looser coupling between different systems by reducing their surface area.

As an example, consider the following, which represents most of a simple mail system:

interface MailInterface {
  public function setSubject($subject);
  public function getSubject();
  public function setBody($body);
  public function getBody();
  public function setFrom($from);
  public function addRecipient($recipient);
  public function getRecepients();
  public function addAttachment(SplFileInfo $file);
  public function addHeader($name, $value);
  public function getHeaders();
}

interface MailServerInterface {
  public function send(MailInterface $message);
}

What’s going on here? We’re defining two types of objects: a mail message, and a mail message server. We’re defining them in the abstract, however. We know how a mail message will behave, and we specify that a MailServer is going to get passed a Mail object via a method called send(). Specifying the type of object in the signature of a function or method is known as type hinting, although that’s a misnomer since PHP will happily die with a fatal error if you pass in an object that doesn’t match that “hint” (but will give you a useful error message).

Everything else is up to us to define, however we choose, as long as those conditions are met. Consider the simple implementation:

class BasicMail implements MailInterface {
  protected $subject;
  protected $body;
  protected $headers = array();
  protected $attachments = array();
  public function addHeader($name, $value) {
    $this->headers[$name] = $value;
  }
  public function setFrom($name) {
    $this->addHeader('From', $name);
  }
  // ...
}
class MailServer implements MailInterface {
  public function send(MailInterface $message) {
    mail($message->getRecipients() , $message->getSubject(), $message->getBody(), $message->getHeaders());
  }
}

Here we’re defining a basic mail object. We’ve omitted the boring bits. Note that we’re not saving the “from” value on its own. There’s nothing in the interface that says we have to track the from value separately, just that we set it via setFrom(). Since From is an optional header (yes, e-mail is a wacky system), we’ll store it that way. We could change that, however, and save it some other way if we wanted.

The default MailServer implementation, then, just passes data on to PHP’s mail system. E-mail messages are generally nowhere near that simple. If there’s an attachment, then you have to mime-encode the file and attach it to the body value with some magic incantation, and that incantation needs to also be reflected in corresponding magic incantations in additional headers. However, that is an implementation detail. The getBody() and getHeaders() methods get to deal with that mess. MailServer does not know or care about those details.

Now, consider an alternate implementation of the mail server that does not actually send e-mail. That’s useful for debugging. See:

class DebugMailServer implements MailInterface {
  protected $logger;
  public function __construct(LogInterface $logger) {
    $this->logger = $logger;
  }
  public function send(MailInterface $message) {
    $value = $message->getRecipients() . $message->getSubject() . $message->getBody() . $message->getHeaders();
    $this->logger->log('debug', $value);
  }
}

This class requires a logger of some sort, matching the LogInterface (which defines the log() method) specified in the constructor. Then, rather than sending an actual e-mail, it just logs what would have been sent. The Mail object does not know or care that the mail isn’t going anywhere. And DebugMailServer does not know or care that LoggingInterface is saving logs to Syslog, or to the Database, or wherever.

The ability to say “I do not know or care what is happening on the other side of this interface wall” is one of the hallmarks of good OOP code. It‘s also one of the hardest things for traditionally procedural programmers to let go of; you don’t want to be responsible for what happens on the other side of that interface, so set up your code such that you don't have to be concerned with what happens over there.

Conclusion

There is far more that could be said about OOP, but hopefully this should be enough to get you started. There are many additional resources online, including the PHP manual itself. A number of other useful articles are included in the Further Reading section.

The most important take-away, however, is remembering that in OOP code, you only care about the interface. The interface may not have any relationship at all to the implementation. If you catch yourself thinking “how do I access the protected properties,” then you are still not thinking of an object as a self-contained black box. Instead, you should be asking yourself “what do I want this object to do, and how do I tell it to do so?” How it then goes about doing it is, by design, a separate question.

Comments

Here is a similar article I put together for the Drupal community -

http://www.paulbooker.co.uk/drupal-development-services/article/quick-in...

Nice, thanks for sharing Paul, great website I will be bookmarking for future reference!

Nice article! One thing: the Mail Server classes should implement MailServerInterface instead of MailInterface.