Data Modeling Exercise: Email Address Consolidation

As this is my initial post, I’d think that it is neccessary to preface my first post with the ever prevalent  ‘ HELLO WORLD’

With that out of the way, an interesting assignment came my way recently that I think is worth sharing.

Problem:  An organization requested a solution that would provide, for end-users (Marketing primarily), the ability to quickly answer the following questions:

What are my unique email addresses by system?
Of these email addresses which have been verified?
Which have been attached to the ‘Do Not Email’ list?
By system, who has accepted the terms of service?
Is email deliverable to the provided address?
How many users are associated with a particular domain (gmail, yahoo, etc.)?
What soliciting options, by system, did the user sign up for?
By system, is the email addresses still active?
What is the preferred email address of the account?
What email addresses have opted out of all corporate communications?

…etc

The source data is stored in relational databases (Oracle, SQL)  across multiple online and internal registration systems.

Methodology: To create a one off set of reports, queries, stored procedures, and/or ETL packages to answer each of the questions would surely be a time consuming process.  Our primary role as data architects is to answer the following questions when presented with technical challenges- what methodologies can be leveraged to bring the greatest bang for the buck to the organization, how can we provide the greatest usability to our end-users, and what needs to be done in order to provide sustainability and maintainability to our fellow developer and operations crew.

Our measure should align to these goals of cost-effectiveness, flexibility, maintainability, and speed at which we can deliver a solution.

Beyond the initial phase of gathering business needs/requirements, we begin to look into what data architecture and model best serves the organization and our systems. In the business scenario provided above, we may ask ourselves the following questions when approaching the initial design:

  • Should we be thinking about the data in terms of trending and analysis and create a dimensional model?
  • Should we provide a flattened version the data allowing the end user to pivot the data as needed?
  • Should we normalize the data so that it reflects our business rules and provides an outlet for both future dimesional modeling and flattening of the data?

To rule out the first notion- the questions posed by the marketing group do not relate to trending (gain insight on sales, subscriptions, revenue, etc.)  at the email level.  Therefore, a dimensional approach does not fit the current issue. Note, email address data could exist simply as attribute(s) of a customer or an account dimension, if needed in the future.

Although option 2 could serve as the master list and provided to the marketing group on a scheduled basis, it is not reusable by other systems and users that may benefit from a more flexible design.

Option 3 is the preferred choice as it serves as a normalized point of consolidation for all source systems. Having a centralized repository for email data will continue to serve the needs of the business and downstream processes.

Below is an example data model that could be used in this scenario.  It allows us to answer any of the proposed business problems and also provides a normalized and consolidated landing zone for our data.  If later down the road we want to use this data model as a source from something like a Customer Dimension, the hard work of attaining and cleansing and conforming the email data will already be complete.

After the data model has been finalized, work can then begin on the ETL design, which could implement a SQL MERGE/Change Data Capture approach (more on that to come)  and views to hide the complexity of the underlying database from the end-user.

Please refer to the Basic Data Dictionary for further explanation of the fields.

Feel free to drop a question or comment!
HostGator promo

Advertisements

About Sal De Loera
I have been involved in Information Technology, specifically Data Warehousing and BI, for over 8 years. Over these years I have assumed the following roles: Software Engineer Intern, Programmer/Analyst, Sr. Data Warehouse Engineer, Data Architect, Director of Information Technology, and as most recently as a Data Warehouse / BI consultant. Additionally, my IT experience spans the following industries: Insurance, Restaurant, Mass Media, Education, and Local Government. I have created this blog primarily as a means to not only share information and case studies regarding Business Intelligence, Data Warehousing, and Data Modeling, but also receive input from the community on some of the topics that I'l cover. Looking forward to your feedback and to the exciting opportunities that come with new people and technology!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: