If you have a very large set of PDFs, and you’re uncertain about which files have searchable text, you can set up an Adobe action utilizing the Preflight tool to find which files contain no, or very few text objects.
In the Actions Wizard, add the Preflight option from the Document Processing menu:
Click on ‘Specify Settings’ for the Preflight action, and in the dialog box select the option for ‘Acrobat Pro DC 2015 Profiles’
Then in the long menu to the right select the option to ‘List page objects, grouped by type of object’
Choose the option to create a report for either successes or errors and set a folder for these reports, and check off the box to display a summary PDF.
Also choose the option in the Save & Export menu to save each file processed by Preflight. Click on the icon to the right to get the option to set a specific local folder to save the reports to.
When it’s run the action will give you the option to add multiple files:
The action will generate a PDF portfolio with multiple PDFs for each original PDF. Select all of the PDFs in the portfolio, and then right click and select the option to extract each PDF from the portfolio.
Then combine the reports into a single PDF file, and then save the text of the report to a text file . . .
Open the text file in a text editor, and run a find and replace to make sure that the captions, ‘File name:’; ‘Path:’; ‘Text Objects’; ‘Vector Objects’ each appear at the beginning of a new line.
Then paste the text into column A of an Excel spreadsheet. In column B enter this formula:
. . . start in cell B2, and then pull down using CTRL + D. In cell C2 do the same with this formula:
Note that in the reports the text object count for a file is listed before the file name.
When you filter for any entries in column C, you’ll see how many text objects are in each file:
Keep in mind that a file which has a lot of text which still needs to be OCR’d, may have a few text objects used in an exhibit slip sheet, headers, footers, and so forth. Review any file that has a small number of text objects based on the overall page count.
One of the nice features of CaseMap is that by default it generates OCR text for any documents that have been linked to objects. When any user works in CaseMap it will create search text for all other users to utilize. On the Case Tools tab, in the Case Index menu, you’ll find the option to review the OCR status of any document linked in CaseMap.
The ‘Not Indexed’ tab will indicate which files are waiting to be OCRed, and which cannot be OCRed.
The first tab indicates which files have yet to be added to CaseMap’s search index.
If no user has the OCR function enabled, a note will be displayed in this dialog box that someone needs to enable it
If you get this error message when you attempt to run a full text search . ..
It means that you don’t have write permission for the ‘Index’ subfolder directory in the CaseMap network folder.
Someone with rights to this folder will have to update the search index.
I’m on vacation this week, so I’ll just post a quick note about working while traveling. The Acela train from New York City to Washington, D.C. has no shortage of business travelers – many of them attorneys. If you were wondering if it’s practical to work on a laptop during the trip, the answer is unfortunately not. I tried getting online using both my my T-Mobile hotspot, and then with Amtrak’s WiFi. While I was able to reach several web sites, they loaded slowly. I reached the remote desktop, and was able to open documents, but the lag in-selecting and opening emails was too long to make the effort worthwhile.
When I first connected with my smartphone’s hotspot, I had to reboot before being able to get Windows 10 to connect with WiFi.
If you’re very patient you’ll be able to get some work done, but it’s a struggle.
The Tip of the Night for April 28, 2022 discussed how degaussing disrupts the magnet ic field of hard drives and and tapes, erasing stored data. High volume degaussing devices exist which can erase hundreds of devices in an hour. Data Security, Inc. markets a machine which can both degauss and physically destroy drives fed into it on a conveyor belt.
Keep in mind that a solid state drive cannot be degaussed, because it does not store data magnetically.