Relational Database Persistence with NHibernate, Part 1
Take advantage of the best relational databases and object-oriented design have to offer without compromising either. Using an object/relational mapping framework like NHibernate, you can significantly reduce the amount of code you write (and therefore potential bugs) for performing standard operations against your database and save the heavy ADO.NET coding for the complicated scenarios.
In my work, I have observed that data access is one of the more complex and error-prone elements of many business computer systems. Developers always seemed to fight with the persistence mechanism and the related problems it caused for deployment, upgrades/changes, integration, reporting, and other important aspects of a complete enterprise line-of-business application.
When I moved to .NET and started leveraging its OO capabilities, it seemed data access became even more complex. I found that I frequently had to compromise either OO principles or data access principles and it never seemed to work out as well as I had hoped. As I started trying to solve more complicated challenges like reporting, integration with other systems, schema migration from one version to the next, and other enterprise concerns, it only got worse and more complex.
As time went on, I spent some more time researching the problem and became aware of a new type of solution. The theory is that, in object-oriented systems, there is a fundamental mismatch between the inherent capabilities, strengths, and weaknesses of object-oriented design and the inherent capabilities, strengths, and weaknesses of a relational data structure design. Many of the problems and resultant defects commonly observed in the persistence layer of an application, the theory suggests, trace back to this inherent mismatch. The theory calls this mismatch the “impedance mismatch.” Keeping this in mind, an application architect who operates under this theory should aim to reduce the complexity by avoiding this mismatch.
Primary Source of Complexity: The Problem Domain
All applications exist to solve one or more problems. The “problem domain” (or “domain” for short) is these problems that the application attempts to solve. The domain is where the primary complexity lay and will involve the most attention from programmers, designers, and architects. Statements such as “Inactive employees should not be scheduled for work items” usually express these domain problems. Since I like to use object-oriented systems such as .NET, I would likely start out trying to model this problem using an “Employee” object and a “Work Item” object. I might have an “Active” Boolean property on Employee as well as a method called “ScheduleWorkItem” that would attempt to schedule a work item for that Employee and report problems if any exist. This is a contrived example, of course, but you will hopefully get the general idea. I have found that object-oriented design gives me the right balance of rigidity and flexibility necessary to address most of the complexity I run into in most line-of-business problem domains.
I like to call objects like “Employee” and “Work Item” entities. Entity is one of those terribly overloaded words with conflicting definitions depending on the context. To me, in the context of domain modeling, entity has a specific definition: A single, uniquely identified unit which represents a set of related data and behavior that is different from other single, uniquely identified units of data and behavior. Entities are very important in my systems. Their identifier never changes, but their data and behavior may, according to the rules of the problem domain.
Finally, I find that languages like C#, VB, and others as well as the capabilities of the .NET Framework afford me better options for making decisions to solve domain problems and to craft and design my entities within that domain.
Secondary Source of Complexity: Persistence
Another source of complexity in my applications is the persistence and retrieval of entity data. Today, there are many options for the underlying data persistence store from object-oriented database (OODB), file-based stores, cloud stores, and, of course, the most prevalent: the relational database management system (RDBMS). Currently, I still prefer using RDBMSs such as Microsoft SQL Server as the underlying persistence store. RDBMSs are, as the ‘R’ implies, relational data stores. This means that the theory of impedance mismatch is in play and is a problem I must address.
When architecting a system, I have several principles I use to guide me when choosing which solution to use to mitigate the impedance mismatch problem. They are not absolute rules, though and I sometimes do violate them if I have very good justification. Principles are, after all, to guide, not to dictate. My principles of persistence are:
- persistence ignorance: Since the domain is usually complex enough, I do not need to add the extra complexity of allowing persistence-related concerns bleeding into my domain entity logic. I keep persistence concerns, to the maximum extent possible, in the persistence layer. It never works out 100%, but what few compromises I must make are tolerable.
- domain concerns are the concern of the domain: It’s hard enough keeping the domain consistent and defect-free without allowing domain concerns, business logic, and other such concepts escaping from the domain and appearing in others areas of the application such as the presentation layer, the database (SQL stored procedures), ad-hoc queries with inherent logic embedded in WHERE clauses, etc. Decisions that are the concern of the domain should flow from the domain out into other parts of the system and those parts should respect the domain’s authority to determine how those decisions are determined.
- reporting, data integration, and schema versioning, are separate problem domains: This is perhaps one of the easiest principles to fail on proper adherence. It’s easy to allow complex query specifications (that is, reports) to creep into the domain. When I use the word “report”, I don’t just mean things like “ad-hoc reports” (think Crystal Reports, SQL Server Reporting Services, and similar products). I mean anything that involves submitting a query for data using more than a few simple criteria-possibly criteria specified by the user. In many cases, I will consider handling these types of queries/reports in a view, stored procedure, or some other facility of the database designed for the given type of situation. I take great care, though, putting anything that will likely see frequent change into the database as it is usually more difficult to handle change management, versioning, and other such concerns. I realize that I have to balance the needs of the application with the needs of the business demands of frequent change and rapid deployment while maintaining high quality. These aspects can be at odds and require a high degree of discipline and consideration.
For most situations, database persistence and querying is a repetitive, consistent, and well known problem space. However, in many projects today, developers are writing data access code from scratch or re-writing it from a previous attempt on a previous project. In the .NET space, there are many tools that aid in the problems involved with ADO.NET data access. From my experience, these are mostly-solved problems and it is, frankly, wasteful to write simple CRUD data access directly against ADO.NET today when these tools make it easier, safer, and better performing than the average developer could accomplish in a reasonable period of time writing their own persistence framework.
One genre of these tools stands out especially: object/relational mapping frameworks. In the next section, I will explain object/relational mapping (O/RM) and how O/RM tools can greatly accelerate your product development and reduce defects and complexity from your domain and applications.
The several currently available object/relational mapping frameworks (heretofore O/RMs) for .NET take several different forms and have different strengths and weaknesses because of this. I won’t go into a full analysis and comparison because it is out of the scope of this article. There are a few features that I find particularly compelling or even mandatory in an O/RM offering I might consider. Among these are: persistence ignorance, POCO support, transparent lazy loading, and mapping separate from the model.
Persistence Ignorance and POCO Support
I’ve already mentioned persistence ignorance earlier in this article so I’ll jump right to POCO (Plain old C#/CLR Object). POCO is important because I want to be able to use my domain entity objects without having to attach to a database. Many persistence frameworks (non-O/RM and some O/RM) in use today require objects be connected to a database to even call their constructor. To me, this is an unacceptable requirement as it hampers my ability to use the objects outside of a database persistence context. Unit testing the behavior in my entities would be extremely complicated if not impossible.
POCO also represents the fulfillment of persistence ignorance in that my entities are not required to do or implement much of anything besides some baseline .NET requirements.
Transparent Lazy Loading
Most O/RMs support some form of lazy loading. Lazy loading is the concept where certain properties or related entities for a given entity are not directly loaded when the entity is loaded. Instead, they can be loaded on-demand later (triggering another database call).
Lazy loading simplifies matters on the “O” side of O/RM in that objects are easily available without having to do a lot of back-and-forth with the persistence framework to retrieve the objects needed when they’re needed. Lazy loading comes in two forms: direct and transparent. Direct means that the programmer must initiate the lazy loading directly (by calling the method “LazyLoad()” on the “Orders” property of the “Customer” entity, for example). Transparent means that the programmer can simply start enumerating over the “Orders” collection and the lazy loading will happen automatically without any separate or additional method calls on the object.
Each of these approaches has its plusses and minuses. I have found that, ultimately, transparent lazy loading works best and fits best with my other goal of persistence ignorance. If I have to call things like “LazyLoad()” on my entities, I have allowed persistence concerns to seep into my domain model. With transparent lazy loading, I can use my entities the same way in code whether they’re being serviced from the database or some other data source.
Of course, this doesn’t mean I get to be lazy about lazy loading and ignore the extra performance cost of a potentially unnecessary extra round trip to the database. I balance this against the need for rapid development. When I know that I will need the extra data no matter what, I can signal this to the persistence framework when I’m querying to go ahead and eager-fetch the data that would otherwise be lazy loaded.
Mapping Separate from the Model
Another aspect of a successful O/RM, for me, is that I am able to map between the object model and the relational model separately from either. That is, the relational model should not need to know nor care about the object model and vice versa. This means no attributes on my .NET classes, and no .NET type names in the database, for example.
Thus mapping is, in and of itself, a first-class citizen. It will require maintenance along with the code and the database schema. Without the mapping, neither the objects nor the relational model work very well or accomplish much of anything. Tooling for the mapping is important for me also. The mapping must be easy to create, edit, maintain, etc.
By: Chad Myers
Chad Myers is the Director of Development at Dovetail Software, Inc. in Austin, TX. He has over 10 years of software development experience creating elegant, functional, and durable Web-based enterprise software systems in Java and .NET/C#. Chad spends most of his professional time practicing Agile development techniques as well as object-oriented principles. He tries to spread as much knowledge has he has received by giving user group and conference talks, writing articles, and maintaining an active blog at http://chadmyers.lostechies.com.
One common concern about program-generated SQL (even when parameterized queries), is that they do not perform as well as stored procedures. According to SQL Server 7.0 Books Online, parameterized queries are treated the same as stored procedures and benefit from the same optimization and caching. Oracle and other popular database engines have similar optimization and caching features. This doesn’t let you off the hook for executing properly constructed queries, however. NHibernate can help in this regard, but it won’t automatically optimize every database query. Lazy loading, cascading, and fetching strategies still require careful planning.