Tuesday, February 02, 2010

Dirty Data

I had a very frustrating afternoon trying to get what ought to be a relatively simple set of data out of our knowledge management database. For various reasons, when the system was created it was shaped largely by the whims of a particular manager in the organization, with the result being an often frustrating situation in which it is sometimes not possible to get at information that one would think ought to be a key part of the system. For example, today I wanted to pull sets of projects based on start dates and duration, and that relatively simple project took me four hours, because our system doesn't capture duration and only captures start dates accurately for about half the projects.

The important lesson that was reinforced by my experience today (and frequently when I use the system) is that database design should not be a one-person endeavor. The whims of an individual should never dictate what does, or doesn't, go into a database system like the one I'm working with.

The second lesson I learned today is that when two systems interact with one another, it's really helpful if the people on both ends have similar standards for data cleanliness, and general maintenance. It's so frustrating to see record after empty record waiting for data that's supposed to be synced from a sister system, but isn't at the moment because the people running that other system can't be arsed to fix their subject taxonomy.

No comments:

Post a Comment