Details:
-
Creators:
-
Corporate Creators:
-
Subject/TRT Terms:
-
DOI:
-
Resource Type:
-
Right Statement:
-
Geographical Coverage:
-
Corporate Publisher:
-
Abstract:Legacy data—data collected or compiled in the past and stored in obsolete formats (e.g., paper and floppy disks)—can be made accessible via the process of data rescue, which may be as simple as scanning documents or, to achieve machine readability, as complex as developing artificial intelligence algorithms. Machine readability not only facilitates the discoverability of data and their reuse for new purposes but also expands the coverage of data sets and supports access for screen reader users. At the National Transportation Library, legacy data exist primarily as scanned-to-PDF documents that, albeit digitized, are not machine-readable due to the quality of scanning. Using Adobe Acrobat Pro’s text-recognition function, the Library’s current ability to rescue of machine-readable data from PDF images has an error rate exceeding 25%. Consequently, in an average table of data, at least 25% of cells require manual correction, which though feasible for occasionally rescuing small datasets is infeasible for large-scale data rescue. As an informal survey of other federal information agencies and a literature review revealed, however, lower error rates are possible with alternative methods of data rescue, including using commercial optical character recognition software and contracting external data rescue service providers.
-
Content Notes:Poster presented at the Transportation Research Board 99th Annual Meeting: P20-20654.
-
Format:
-
Collection(s):
-
Main Document Checksum:
-
Download URL:
-
File Type: