Meta-driven Clinical Data Loading into i2b2 for Clinical and Translational Science Institutes


ACTSI investigators and leaders Andrew Post, MD, PhD, associate professor of biomedical informatics, Emory University School of Medicine and David Stephens, MD, interim dean, Emory School of Medicine and vice president for research, Woodruff Health Sciences Center, were recently published in the AMIA Joint Summits on Translational Science Proceedings, AMIA Summit on Translational Science, for their work on i2b2.

i2b2, Informatics for Integrating Biology and the Bedside, is a research data warehouse. This system gathers information on patients from a variety of sources, like electronic health records, lab results, genetic and research data, as well as birth registries and government data like Medicaid. It cleans and anonymizes the data, which can then be queried by a user of the system. i2b2 is a self-service tool used to aid in the exploration of clinical data already gathered into a system. This becomes helpful when one is applying for a grant and needs to know how many patients are available within, for example, Emory Healthcare. This customizable system allows one to include or exclude certain criteria to identify a proper count for a study (for instance: male patients with adult diabetes).

Typically, data is loaded into data warehouses through Extract, Transform, and Load (ETL) software. This involves (1) gathering the data from sources, (2) transforming it to the proper format and structure, and then (3) loading it into the final database. This process is one that consumes a lot of researchers’ time, and oftentimes becomes complicated when data comes in from a variety of sources, all supported by different vendors or hosted on separate computer hardware.

Post and Stephens sought to remedy this exhaustive and expensive conundrum by implementing Eureka!’s metadata-driven ETL process to create a faster and more streamlined system. One could think of metadata as a storage of information on data, helping to support relationships between elements, as well as updates to the system itself. This also helps in encompassing the relationships between data, reports, and processes. Such a process will also aid in the support for various data sources and formats.

“While i2b2’s traditional ETL is usually an arduous, time-consuming effort, moving to a metadata-driven system like Eureka! Clinical Analytics will enable cross-institutional efforts to join data networks, as well as enhance the current infrastructure to support multicenter clinical research,” said Post. “This, in turn, helps to expedite the scientific process by downsizing the unnecessary work associated with creating research data marts and giving proper estimates of patient counts for IRB protocols and grant submissions.”

Read More