Microsoft ® Certified Partner GSA Schedule Contract GS-35F-0935N NIH BPA - National Institutes of Health - Blanket Purchase Agreement Discovery Logic
A Thomson Reuters Company
About Us Services Expertise Portfolio The Future Home
Home > Portfolio > The Enumeration Project
Careers Link to Contact Us
Identifying the Value of an Idea - Turning Data Into Knowledge
The Enumeration Project
Discovery Logic was challenged to produce an automated system for tracking how many key personnel NIH supports across its large universe of extramural grants. This project is just what it sounds like: a complex challenge in counting. Discovery Logic built the solution using the Microsoft .NET Framework, SQL Server, Oracle, and the AdLib Express Server.


The Enumeration Project Over the years, NIH has experimented with several techniques to determine how many researchers it supports on research grants - not only principal investigators (PIs), but varying numbers of co-investigators, postdoctoral fellows, students, and others. Previous enumeration efforts were labor-intensive and lacked detail about the names or other qualities of the researchers. In 2007, NIH asked Discovery Logic to develop a more accurate and automated system to count the number of researchers supported by NIH research grants and, when possible, to link names with person profiles in the IMPAC II database.

An early NIH enumeration effort was the “Staffing Patterns” report issued in March 1993, which sampled R01 and P01 grants only. The study estimated that NIH supported 55,480 FTEs in FY 1990, an average of 3.3 FTEs per grant. A second enumeration exercise was done in March 2003, based on a “convenience sample” that depended on paper forms submitted by each of NIH's Institutes and Centers (ICs). This study estimated that the number of “key personnel” was 207,711. These early efforts did not go beyond estimating numbers of key personnel records and did not address the nontrivial problem of multiple appearances of the same people on different research projects.

In July 2007, Discovery Logic, under contract to NIH, began “Enumeration Exercise #3” to determine the number of key personnel supported by NIH grants using automated methodologies for increased precision and recall. The study used the “grant progress report” filed annually by each PI as the primary data source. These scanned images were not available at the time the previous studies were performed. In addition to counting personnel, Discovery Logic designed an automated system to extract and collect names, ages, degrees, and roles on the project, providing further insight into the research workforce. This process included all research activity grant types and counted over 95% of all progress reports. Manual verification indicated an accuracy of 90% for the automated extraction process. We also developed a weighted similarities algorithm across name, SSN, and date of birth fields to account for multiple appearances of an individual across grants. This conservative algorithm reduced over-counting of people by more than 30%, and also provided a means to link records between years and to link records back to IMPAC II profiles.

Discovery Logic demonstrated that data from this project can also be used to support workforce analysis - a benefit that NIH did not anticipate. Indeed, until now, the NIH IMPAC II database has allowed only for analysis of PIs on research and fellowship grants. The Enumeration personnel database provides information on graduate students, postdocs, and research staff that were heretofore invisible.

PDF For additional information please see: "How Many Scientists Do the NIH Support? Improving Estimates of the Workforce."

Download Reader: PDF Download


Portfolio
Grants Management Systems
Grants Routing and Classification
Scientific Portfolio Management
Synapse for Tech Transfer
The Enumeration Project
We are organized to solve problems
Learn more
We prefer to have continuing relationships with clients
Learn more
We deliver more than you expect
Learn more
1375 Piccard Drive,   Rockville, MD  20850  800-467-4009  240-912-1900 Press Room|Locations|Privacy Policy|Terms of Use|Copyright

Discovery Logic and Thomson Reuters are a Registered Trademark of Thomson Reuters
© 2010 Thomson Reuters