 |
 |
  |
 |
  |
 |
 |
Discovery Logic was challenged to produce an automated system for tracking how many key
personnel NIH supports across its large universe of extramural grants. This project is
just what it sounds like: a complex challenge in counting. Discovery Logic built the
solution using the Microsoft .NET Framework, SQL Server, Oracle, and the AdLib Express Server.
Over the years, NIH has experimented with several techniques to determine how many
researchers it supports on research grants - not only principal investigators (PIs),
but varying numbers of co-investigators, postdoctoral fellows, students, and others.
Previous enumeration efforts were labor-intensive and lacked detail about the names
or other qualities of the researchers. In 2007, NIH asked Discovery Logic to develop
a more accurate and automated system to count the number of researchers supported by
NIH research grants and, when possible, to link names with person profiles in the
IMPAC II database.
An early NIH enumeration effort was the Staffing Patterns report issued in March 1993,
which sampled R01 and P01 grants only. The study estimated that NIH supported 55,480
FTEs
in FY 1990, an average of 3.3 FTEs per grant. A second enumeration exercise was done in
March 2003, based on a convenience sample that depended on paper forms submitted by
each of NIH's Institutes and Centers (ICs). This study estimated that the number of key personnel was 207,711. These early efforts did not go beyond estimating numbers of key
personnel records and did not address the nontrivial problem of multiple appearances of
the same people on different research projects.
In July 2007, Discovery Logic, under contract to NIH, began Enumeration Exercise #3
to determine the number of key personnel supported by NIH grants using automated
methodologies for increased precision and recall. The study used the grant progress
report filed annually by each PI as the primary data source. These scanned images
were not available at the time the previous studies were performed. In addition to
counting personnel, Discovery Logic designed an automated system to extract and collect
names, ages, degrees, and roles on the project, providing further insight into the research
workforce. This process included all research activity grant types and counted over 95% of
all progress reports. Manual verification indicated an accuracy of 90% for the automated
extraction process. We also developed a weighted similarities algorithm across name, SSN,
and date of birth fields to account for multiple appearances of an individual across grants.
This conservative algorithm reduced over-counting of people by more than 30%, and also provided
a means to link records between years and to link records back to IMPAC II profiles.
Discovery Logic demonstrated that data from this project can also be used to support
workforce analysis - a benefit that NIH did not anticipate. Indeed, until now, the NIH
IMPAC II database has allowed only for analysis of PIs on research and fellowship grants.
The Enumeration personnel database provides information on graduate students, postdocs,
and research staff that were heretofore invisible.
For additional information please see: "How Many Scientists Do the NIH Support? Improving Estimates of the Workforce."
Download Reader:
|
 |
|
|
 |
 |
|
 |
 |
 |
| We are organized to solve problems |
 |
Learn more |
 |
| We prefer to have continuing relationships with clients |
 |
Learn more |
 |
| We deliver more than you expect |
 |
Learn more |
 |
|
 |
|
 |