Search billions of records on Ancestry.com
   

GDMUML

Genealogical Data Models in the Unified Modeling Language


Last update: 29-Jan-2003

Welcome!

This website is dedicated to the topic of data models used in genealogy.

First, a word about terminology. The term "object model" more accurately describes our focus. This is because a UML model decomposes genealogical systems into objects which have behavior and attributes. The attributes (or data) represent only one aspect of an object.

Why study object models?

Object models are like blueprints for software. They help a software developer break down complex programs into manageable components and objects. They also help the designer visualize the relationships between the various objects and what information the objects may share. It is much easier to produce several alternative software blueprints and discuss their pros and cons than it is to make changes to the software later when the program has already been built.

Another important reason for working with object models, is that it allows "domain experts" to have influence over the outcome. In this case the "domain expert" is a professional genealogist or genealogy enthusiast, who want the software to support certain "standard practices" in their specialty. By engaging the experts early on, the design accomodates the needs of end users.

So what kinds of data and object models exist?

There is a data model developed by the GENTECH Lexicon Working Group, called the "GENTECH Genealogical Data Model Phase 1" (hereafter referred to as GENTECH-GDM). For more about it, see the sections below.

The GeniML specification at the GeniML website, presents the "GeniML Object Model and Vocabulary". This specification is for an XML vocabulary with an XML schema the intended eventual outcome.

The GenXML specification at the GenXML website, defines a file format for exchange of genealogical data. This XML schema is influenced by the GENTECH Genealogical Data Model.

The gdmxml specification at the gdmxml website, defines an XML implementation of the GENTECH Genealogical Data Model. "Specifically, it is a RELAX NG Schema to validate XML documents with genealogical information according to the Genealogical Data Model put together by the Lexicon Group from GENTECH."

The GEDCOM Standard Release 5.5 (unofficial HTML version) can be viewed at Paul B. McBride's Genealogy website. "GEDCOM was developed by the Family History Department of The Church of Jesus Christ of Latter-day Saints (LDS Church) to provide a flexible, uniform format for exchanging computerized genealogical data." This is the GEDCOM interchange format in common use today.

The GEDCOM XML Specification Release 6.0 is available from the LDS Church as the file GedXML60.pdf. This is a DTD for GEDCOM content. It is considered a "beta" version which was released December 6, 2002.

GENTECH-GDM Background

Version 1.0 of the GENTECH Genealogical Data Model was released August 21, 1998 as an RFC (Request for Comment) at the Federation of Genealogical Societies Conference. On May 29, 2000, Version 1.1 of model was released. This is the current specification which is available at the GENTECH website.

The data model uses ERD, Entity Relationship Diagrams for its description. The terminology is heavily influenced by relational database concepts. Much of the early work on the model took place in 1996, before UML (Unified Modeling Language) was widely adopted as the standard object modeling language.

The purpose of the model was to "define genealogical data for the purpose of facilitating data exchange among genealogists." A logical model was created which serves to describe the conceptual relationships of genealogical data but not an implementation model from which software could be constructed.

GENTECH-GDM in UML

I have recast the GENTECH-GDM in UML terms.
The by-product of this analysis is a GENTECH-GDM Reference Model which is available in PDF format. It describes classes of the Evidence, Administrative, and Conclusional sub-models and has detailed class diagrams for each sub-model. This is a "strict" interpretation of the original GENTECH Entity-Relationship model. The entity names and attributes are carried over to the UML classes. A separate section extends the model with a suggested sub-model for Dates.

GENTECH-GDM Reference Model (in UML): gdmref-01.pdf (920 kb).

GDM Sandbox

Development is underway on a "GDM Sandbox" MS-Windows application. It will use an XML-file for serializing/deserializing the contents of a GDM project. The plan is to implement each submodel as a separate stage:

Once the major sub-models are in place, import/export interfaces to some other file formats, as well as other features can be added. As stages are completed, they will posted here for feedback and comment.

The current draft of the GDMUML specification is available through this link: GDMUML-0.4.
The 0.4 revision describes the classes of the Evidence, Administrative, and Conclusional submodels and includes class and object diagrams. (This document will evolve to become the implementation specification for GDM Sandbox.)

Contact Information

Please send comments and suggestions for this website to:
Stanley Mitchell (stanlmitchell@yahoo.com)

GDMUML Mailing List

The topic of this list is genealogical data models in the Unified Modeling Language (UML). Currently, discussion focuses on the GENTECH genealogical data model. This forum should be of interest to software developers and genealogical method researchers.