Should Your Objects Validate Themselves?

It seems quite common in the ColdFusion community to see objects with a validate() function. This is used to handle when your object contains potentially invalid data, such as invalid data supplied from a form.

This leads to a good question; should your objects be allowed to have an invalid state? Let’s consider an example where you might have a simple user object which has an first name, last name and date of birth.

Let’s suppose that for this object to be valid we have the following rules:
the first name and last name are required
the birth date is optional, but if supplied it should have a value not more than 80 years ago.

Scenario 1: Invalid is OK, Dude, it all gets sorted in the end

Suppose you are creating a new User and have a simple user form. When your user form is submitted, you may have some code like this:

<!--- Create and populate our user --->
<cfset user = createObject("component","User").init()>
<cfset user.setFirstName(form.firstName)>
<cfset user.setLastName(form.lastName)>
<cfset user.setDateOfBirth(form.dob)>
 
<!--- Validate the object and handle any errors --->
<cfset errors = user.validate()>
<cfif arrayLen(errors) eq 0>
	<!--- All good, so on to the next thing --->
<cfelse>
	<!--- Otherwise, handle the errors --->
</cfif>

So in this example, our user object is potentially in an invalid state, but a quick validate() lets us know if we are home free or if our user needs a little more attention.

Some things to note:

  • We can set any values at all in our user object
  • Validation is done within the object which returns an array of errors.

Scenario 2: Not OK, really really not OK

So what’s the problem with the code above? Looks good to me. Well let’s consider an alternative approach:

<!--- First validate our data, various options on how this might be done. --->
... some validation and other stuff happens here ...
 
<!--- Only create our object if there are no errors --->
<cfif arrayLen(errors) eq 0>
	<cfset user = createObject("component","User").init(firstName, lastName, dob)>
	<!--- All good, so on to the next thing, e.g. saving our user --->
<cfelse>
	<!--- Otherwise, handle the errors --->
</cfif>

Couple of things to note:

  • The users initial values are set in the init() function.
  • If either the first or last name are blank then an error will be thrown.
  • If the date is invalid a different error will be thrown.
  • So in this example, the user object will only exist in a valid state. It cannot exist with invalid data.

    When a User might not be a User …

    In the first scenario, our object can contain ANY data at all, in particular it can contain ANY data our trusty users may like to enter. Suppose I handed you one of these user objects. You cannot assume that it is valid, because there is nothing to stop it from being invalid. Just to be sure, you might need to call validate() wherever you need to ensure the object data was OK to use.

    Looking a little further, it seems to me that our User object is really just a “objectified” version of our form data. We let any data go into our object, then we ask the object to validate itself.

    Our User object may have lots of behaviour, and that helps us to think this is a good useful object, but something doesn’t seem right.

    So perhaps what we are calling a User object is not really a “User” at all. Perhaps our object is really a “UserForm” object. Maybe that makes more sense – forms need to be validated, so userForm.validate() might be much better that user.validate().

    This is an important distinction, so let’s redefine our objects:

    User: Is a business object and always exist in a valid state. Whenever you have a user object you know it is completely valid.

    UserForm: Is a representation of a form. It can validate your form data and transform your form data into a real live business object.

    Giving our UserForm a spin …

    So if we are dealing with a UserForm now, then what should our code look like? It could look something like this:

    <!--- Create our UserForm object and populate with the form data --->
    <cfset userForm = createObject("component","UserForm").init()>
    <cfset userForm.populate(form)>
     
    <!--- The validation occurs when you call a function called hasErrors() --->
    <cfif not userForm.hasErrors()>
    	<!--- All good, so now we can get a nice clean User object --->
    	<cfset user = userForm.getUserBean()> <!--- Nice function! --->
    <cfelse>
    	<!--- Oh, errors here, so handle them --->
    	<cfset errors = userForm.getErrors()>
    	... display our form again here with errors ...
    </cfif>

    So in this example, the user form objects knows how to create a user object based on valid form data.

    So should your objects be allowed to have an invalid state?

    It seems to me that “business objects” (or “beans”) that can exist in an invalid state are really just wrappers for potentially invalid user supplied data.

    This same scenario can happen when importing data from an external file. We need to import the data into a “temporary” location first, then validate and only move to its final storage location (in a bean or in the database) only after the data is validated successfully.

    Objects that are permitted to exist in an invalid state also seem less useful. In our example above, we know that the users name will absolutely always be a non empty string and the date will either be an empty string or a real date object (rather than a date string, for example). This knowledge gives us complete confidence in using our object in any scenario with no surprises.

    What do you think?

    This entry was posted in OOP and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.

    12 Comments

    1. Posted June 12, 2009 at 5:52 am | Permalink

      The problem I have with the UserForm object is that I might have many forms which interact with the User object. Should I duplicate all of my validation logic into multiple UserForm objects (e.g., UserRegisterForm, UserLoginForm, UserMailingPrefsForm, etc.)? Should I have one "master" UserForm object with a bunch of conditional logic to determine which actual form I’m using? Neither of those feel right to me.

      Another point is that the logic dictating whether data for a user is valid can change based on the context. When first registering a User perhaps only an email address and password are required, but when updating an address then various address fields are required. Again, would this necessitate multiple UserForm objects?

      Also, the logic that we’re defining here applies to the User object, not to one or more "form" objects. Maybe I’m not even using a form to update the User. Maybe the user can update their name via email. Should I still call the object used in that situation a "form" object, or is there now a UserNameChangeEmail Object?

      I feel that the logic that represents the business rules that determine whether data is valid for a User should be linked to the User object in some way, not to "form" objects.

      I store all of those business rules in metadata, and that metadata can then be used to automatically generate code which will enforce the validation rules. I have actually developed a complete system to do that, which is available as open source software. Currently I am using that in a scenario in which the User object can validate itself, but I see no reason why that metadata couldn’t be used by a UserForm object, if that’s the route that one chose to take. I just don’t like the idea of creating extra objects that aren’t really necessary.

      Perhaps OO purists will call me names for this, but I think this "an object cannot be in an invalid state" thing is a red herring. I prefer not to worry about "rules that must be followed", but rather concentrate on allowing fundamental OO design principles to guide me, and take a practical approach. I am in total control of what happens in my application, so I know that I’m never going to get a User object that isn’t invalid. If a User object is populated with data from an external source, then it gets validated and if it isn’t valid it gets discarded. There is no danger of an invalid User somehow ending up floating around my system.

      Whew, I didn’t realize I had that much to say on the topic ;-) Thanks for raising this interesting issue.

    2. Brian FitzGerald
      Posted June 12, 2009 at 7:16 am | Permalink

      Really nice post. You outlined the issue concisely. I have to give +1 to Bob’s response which hits on what seem like some key issues with the UserForm approach.

      To me, the UserForm approach seems like an additional layer of objects to create and maintain unnecessarily. When you consider the number of objects in a system, Product, Order, User, Notification, etc., it seems you’re doubling your business objects without much incentive if you have ProductForm, OrderForm, UserForm, NotificationForm, etc.

    3. Posted June 12, 2009 at 7:23 am | Permalink

      Hi Bob

      Thanks for your comments – you mentioned quite a few things I hadn’t considered and gave me some great things to think through.

      > The problem I have with the UserForm object is that I might have many
      > forms which interact with the User object …

      In the case where the user data is used in a multiple form contexts then I would consider the UserForm would then represent a part of the form rather than a whole form. It would be used to validate those couple of fields within the larger context.

      > Another point is that the logic dictating whether data for a user is
      > valid can change based on the context …

      That’s a good point, hadn’t thought about it. I would still consider that there would only be one UserForm object which is responsible to being aware of all of the possible fields, but you could perhaps supply a NewRegUserFormValidator or UpdateRegUserFormValidator to the init() which knows how to validate it in specifc contexts?

      > Maybe I’m not even using a form to update the User.
      > Maybe the user can update their name via email.
      > Should I still call the object used in that situation a "form" object …

      The data could come from a variety of sources – forms, email, imported data, etc. But perhaps the idea is still the same because it is still just a collection of fields, but maybe I need a more abstracted name if it is used in various ways. This object is essentially taking external data and transforming it into an object, perhaps UserTransformer? Not sure. I agree the name needs some thought but perhaps the idea is still ok.

      > I feel that the logic that represents the business rules that
      > determine whether data is valid for a User should be linked
      > to the User object in some way, not to "form" objects.

      Yes, I agree. If we consider than an object can only exist in a valid state, then there must be validation logic within the object that ensures this, e.g. thowing errors when attempting to change the object into an invalid state. In this scenario, the validation would be handled by attempting to set the data and catching errors – now *this* I am a stuck on. Seems a bit clumsy, but it does prevent duplicating the logic in the separate object.

      > I store all of those business rules in metadata …
      > I have actually developed a complete system to do that,
      > which is available as open source software.

      That is an awesome mighty effort Bob! Yes, your validation library would fit perfectly wherever needed. I will have to take a closer look!

      For anyone reading, it is here
      http://validatethis.riaforge.org/

      > Perhaps OO purists will call me names for this, but I think this
      > "an object cannot be in an invalid state" thing is a red herring.
      > I prefer not to worry about "rules that must be followed",
      > but rather concentrate on allowing fundamental OO design principles
      > to guide me, and take a practical approach …
      > I know that I’m never going to get a User object that isn’t invalid.
      > If a User object is populated with data from an external source,
      > then it gets validated and if it isn’t valid it gets discarded.
      > There is no danger of an invalid User somehow ending up
      > floating around my system.

      Through your comments I realised that an objects data can potentially come from anywhere; form, email, imported file, database, etc. With objects that are populated from a database though, there is a possibility that the objects could come in invalid, perhaps because to another process had changed the data. If objects were required to validate themselves then this would mean that validate() would need to be called for each object read from the db?

      If the object could only exist in a valid state then it would bomb out as soon as the object was being read. If the object can exist in an invalid state then any errors would show up later, perhaps as side effects elsewhere in the system?

      Thanks

      Kevan

    4. Posted June 12, 2009 at 7:32 am | Permalink

      Hi Brian

      Yes, I am thinking this through more after Bob’s comments.

      I suppose the main role of the extra object is to act as a data transformer. As Bob mentioned, the forms and objects may not have a one to one relationship – one form could result in many objects, or one form could represent only part of an object.

      Perhaps this extra object has similarities to a Gateway that transforms database data into objects. In this case transforming form data into objects, but also provides validation.

      Found this link which appears related:
      http://www.scribd.com/doc/10894361/Transformation-Interface-Design-Pattern-Andy-Bulka

      Thanks

    5. Posted June 12, 2009 at 7:58 am | Permalink

      Hi Kevan,

      Thanks for your thoughtful responses to my comments. Regarding the issue of objects being in an invalid state, I’m not concerned because I have total control over the system and the data. It isn’t possible for an object to be persisted to the database in an invalid state, so it’s also not possible for one to be retrieved in an invalid state. I can see how if this scenario doesn’t apply (e.g., data can be populated from outside your system) that you’d have to have a way of dealing with that. It’s just that I don’t have that concern so I don’t need to solve a problem that doesn’t exist for me.

    6. Posted June 12, 2009 at 3:18 pm | Permalink

      Hi Bob, Yes that makes sense; if your objects are persisted and retrieved through a standard mechanism then there would never be a problem, and realisitically if you had separate process changing then this data incorrectly then it would just be a bug that needed to be fixed, not a design problem.

      Thanks

    7. Posted June 12, 2009 at 11:34 pm | Permalink

      I might like to add an alternative point of view, one which Kevan touched lightly in the example in which the arguments were passed within the init function.

      What is needed is to really look at the essence of the validation problem. There are two ways in which the object can get to an invalid state:

      1) when an individual property has an invalid value independently from the values of all other properties. i.e. "password has to be at least 6 characters long"

      2) when the invalid state is caused by the relation between two or more properties, i.e. if property A is given, then property B must be non-empty.

      Checking for case 1 is simple enough by putting the appropriate checks in "setter" methods; however case 2 is more complex since the logic, as Kevan points out, cannot just reside on the setters since the object will be invalid in the time between the setters for the two related properties are called.

      In my opinion, this problem needs to be dealt at the time when the object is modeled. There is nothing in OO-land that says that you "have to" have a setter method for every single property in your object. If you know that the business rules for the object indicate that it should not be allowed to be in an invalid state, but at the same time the validity of the state involves cross-checking multiple properties, then your object should provide a way in which all these properties are entered at the same time. This is pretty much what Kevan did when he passed all the values on the Init() method. Now, using the init() is not the only way to do this. You can very well have a setter method that takes some sort of transfer object or value object that is just a container for multiple properties (without validation). You can even use a plain old "struct" type for this and keep things simple. This method ( init(), setMemento(), setState(), or whatever) would take this container of data, apply whatever validation rules are needed and then react accordingly either by accepting the values or throwing errors. This way the business logic that determines when the object is valid or not remains encapsulated within the object, but you ensure that your object will ALWAYS be in a valid and consistent state.

      Cheers,

      Oscar

    8. Posted June 13, 2009 at 6:21 am | Permalink

      Hi Oscar

      This reminded me that if you don’t pass values via the init() function then your object needs to internally set itself to a valid initial state.

      In our example above where a first and last name were required, the object might set default values of the first name and last name fields to be "First Name" and "Last Name" if that was appropriate. If having these default "place holder" values was not appropriate then these would need to be mandatory in the init().

      Thanks for expanding on that part, great points.

    9. Posted June 13, 2009 at 10:39 am | Permalink

      validation at init() doesn’t work when the preference(s) can be set by the user after the object has been created.

    10. Posted June 13, 2009 at 2:27 pm | Permalink

      Hi Henry

      There are a couple of ideas to consider:

      1) When you are designing your object you would not necessarily have setters for every field.

      2) The validation probably doesn’t sit in the init() function. It’s likely that the init() would call the setters/other functions (some possibly private) which would in turn perform the required validation.

      Couple of scenarios:

      a) You may have an object with no setters – the only way to get data into the object is via init().

      b) You may have an object with no setters – all data is set via a populate() function which takes a struct of values.

      c) You have setters that set more than single values. Rather than setFirstName(name) and setLastName(name) you might have setName(first,last).

      d) You only have setters for a limited set of fields. The other fields are modified by other functions. For example, firstName and lastName are set on init() but are modified by a function called makeDefaultUser().

    11. Posted June 13, 2009 at 2:35 pm | Permalink

      Thanks.. here’re my thoughts.

      1. I understand, getters and setters should be avoided if possible, but practically there are many objects that requires setters, especially the beans.

      2. validation on init() is limited for web app since validation is only done on instantiation… so how can I validate when I needed to update certain fields in the future after the object is loaded back from the DB?

      a.) that’s for immutable object. How about object that represents a row in DB, and the user changed a field in a form? (e.g. shipping address?) I still need the setter(s)..

      b.) populate() seems worse than setXXX(), ’cause u need to document what keys in struct are needed somewhere, and made the code less self-documenting.

      c.) I don’t see how that’s better than having setFirstName() and setLastName()

      d.) how can u use makeDefaultUser() to set first & lastname?

    12. Posted June 13, 2009 at 6:45 pm | Permalink

      Hi Henry

      Yes, use of getters, and setters to a lesser extent, needs some extra consideration.

      If I am following correctly, your object can be initialised via init() but also can have any setters you need. For example, your User object may have functions:

      init(firstName,lastName,dob)
      setName(firstName,lastName)
      setDateOfBirth(dob)

      Your init() function would probably call setName() and setDateOfBirth(). The actual validation would occur in these two functions, not in the init() function.

      So, your objects are not immutable, even though they are are passed initial values via init()

      Your populate() function could still self document by listing out all of the possible arguments in the function, and then call populate(argumentcollection=data).

      I don’t really think of populate() as being better or worse than setters. Each object should be modelled according to it’s needs. Sometimes setting values in one bunch is useful, sometimes providing setters is useful. And you can provide both if that helps.

      In the makeDefaultUser() function, it may be implemented something like:

      <cffunction name="makeDefaultUser" output="false">
      <cfset variables.firstName = "First Name">
      <cfset variables.lastName = "Last Name">
      <cfset variables.dob = "">
      </cffunction>