Emily Pfaff, PhD, is Co-director of Informatics and Data Science at NC TraCS. She has a PhD in Health Informatics and a master’s in Information Science, both from UNC. Her primary expertise and research interests are in “computable phenotyping” and clinical data modeling in support of translational research. We recently spoke with Pfaff about improving/refining health data collection and analysis, COVID research, and the role of informatics in solving today’s pressing health problems.
InsideTraCS: Get to know your extended research team through a new series featuring conversations with faculty and staff.
Emily Pfaff, PhD is Co-director of Informatics and Data Science at NC TraCS. She has a PhD in Health Informatics and a master’s in Information Science, both from UNC. Her primary expertise and research interests are in “computable phenotyping” and clinical data modeling in support of translational research.
Marla Broadfoot, NC TraCS science writer, recently spoke with Pfaff about improving/refining health data collection and analysis, COVID research, and the role of informatics in solving today’s pressing health problems.
Last September, you finished your PhD in Health Informatics from the Carolina Health Informatics Program (CHIP). Congratulations, by the way! What drew you to health informatics?
I graduated from undergrad with a degree in Russian history—and you won’t be shocked to hear that it was pretty hard to find a job in that field during a recession. When I decided to change course and go back to school, I knew I was interested in information science but was somewhat lacking in direction. Luckily, I ended up getting a graduate research associate position in Psychiatry helping to build study databases, and I was hooked; I loved the idea of using my computing skills to actually impact patient health. An internship at TraCS turned into a position at TraCS, which turned into … ten years later and a PhD. And I still love the idea that computing can impact health! So in a way, health informatics found me rather than the other way around.
What was your dissertation on, in layman’s terms? Are you continuing this work in any way?
My dissertation focused on methods of “computable phenotyping,” or how one translates inclusion/exclusion criteria into code in order to define a patient cohort, or group of study subjects. I experimented with a new method to help ensure that the code-based version of the cohort definition gets as close as possible to the definition of that cohort in the clinician’s mind. As computable phenotyping is so fundamental to the field of clinical informatics, I am definitely continuing this work!
How is your role at TraCS changing, if at all, now that you have a PhD?
Somewhat different, somewhat the same. I co-direct the Informatics and Data Science service with Ashok Krishnamurthy, along with Associate Director Kellie Walters. We approach most things as a team (and have a simply awesome team of analysts, developers, and project managers that do so much of the heavy lifting), so in a way things are operating much as they always did. But, I will have some new opportunities to lead projects as a PI, which opens some doors. I’m excited to see what comes next.
I see that you are involved in the National COVID Cohort Collaborative. Could you give me a sense of the scale of the project and what it is hoping to accomplish?
N3C is an NIH-funded initiative to bring together massive amounts of electronic health record data for patients across the US, and then use those data (and that huge sample size) to answer important COVID research questions. Any researcher can request approval to use the data, which makes it a wonderfully open and democratic resource for doing data-driven COVID research. We currently have about 10 billion rows of data, for almost 9 million patients. Many aspects of working on N3C—especially all the wonderful new colleagues I’ve met—has made it one of the highlights of my career, but at the same time, the circumstances that led to its instantiation are tragic. I always strive to remember that the clinical data we are entrusted with are not just numbers and strings—those data represent real people, many of whom lost their lives in the pandemic. It’s important to maintain that perspective.