From 1,000+ pages to 72 relevant ones.
How to extract, re-bind, and annotate the pages you actually need.
Project
I’m defending a client who is accused of neglecting a child in his/her care. Social workers were involved with the family for a period of time and we’ve been given 1,000+ pages of records from the social workers.
My task is to go through these records looking for indicators that the child was, in fact, being well looked after.
That sounds simple enough but I’m working in a team and I know I’ll have to:
Distribute my work output to the team;
Make it easy for the team to see the records I have selected;
Stand over the choices I make;
Present the same work output to a judge in court later in the case.
If I was doing this with paper documents then Item No. 1 would involve a lot of photocopying, stapling/binding, packaging, and posting. Item No. 2 would probably involve using a highlighter on every copy I make.
I’m not doing it with paper documents though, and I’m documenting my process here to show you how to do it and/or to show you what you’re missing 😁.
Technology #1 - Review & Notes
Optical character recognition isn’t going to help me very much here as a lot of the records are handwritten. However, for those records that are in typed format, OCR may come in handy at a later stage. I’ll come back to that.
I have the records in digital format and I’m going to keep them in digital format and review them on screen.
As I come across relevant records I’ll make a note of the PDF page number in a digital notebook (OneNote).
Methodology
I’ll do a first pass through the records to get a rough idea of what I have, making rough notes (OneNote) as I go.
I’ll then do a second pass and look for specific references that help with the points we want to make. On this occasion I’ll make more detailed notes - the date of the document, the general sense of the content, and the start and end pages of the individual document/report.
I’ll need these start/end pages for the extraction process a little later.
After the second pass I’ll go through my notes and put the documents in date order. This helps me to ensure that I’m covering as wide a period as possible, and not (for example) selecting a cluster of documents relating to a narrow time period.
If I choose to include/exclude a document for a specific reason, I’ll make a short note in OneNote about why I made that decision. All of these notes are going to be helpful for Item No. 3 (standing over the choices I make), if that ever becomes an issue.
I’ll then:
Extract the pages I want;
Create a summary document containing the relevant extracts;
Merge the documents into a digital booklet with page numbers, a Table of Contents, and internal bookmarks;
Annotate the booklet to show clearly where the relevant extracts can be found;
Distribute the annotated booklet to the team.
Technology #2 - Extraction
Now that I’ve identified the pages that I want to extract, that’s my next step. In this case, rather than extracting all of the required pages in one pass, I’m going to extract each document individually. The reason I do this is so that I end up with a set of individual documents, rather than ending up with a single PDF containing all of the required documents. You’ll see why later.
To do the extraction, I simply drag the 1000+ pages into the extraction software and input the pages I want to extract. Once the document is extracted I rename it by date and description. I then repeat the extraction process for the other documents (9 in total, otherwise I might have considered extracting everything in one pass).
At this point I have a set of individual documents, named by date and description, and I’ll move on to the next step.
Technology #3 - Optical Character Recognition
Several of the extracted documents are in typed format, so I’m going to run them through my OCR software. This will render the text searchable and selectable.
I’m not that interested in searching the documents (because I’ve already read them) but being able to select specific parts of the text will help me with two things: (1) copying text from the documents into a summary, and (2) underlining portions of the documents.
Technology #4 - Copy & Paste
I want to create a summary document that sets out, in bullet point format, the extracts from each document. I reckon this will fit in around three pages and will help the team, and the judge who ultimately receives the documentation, to understand the relevance of the documents that are presented.
In MS Word I create a new blank page and add a heading for each of the documents I am including in the booklet. I then simply copy and paste the extracts from each document into bullet point format. That gives me a 3 page document which I then save as a PDF for (later) inclusion in the booklet.
Technology #5 - Merge
I’m now going to create the booklet that I can (a) distribute to my team, and (b) eventually furnish to the prosecution team and to the judge.
To do this, I simply drag the documents (including the 3 page summary created in the previous step) in the merging software and arrange them in the order I want them to appear in the booklet.
I want to create a Table of Contents, so I click that option in the merging software.
I want to create a bookmark tree inside the merged booklet, so I click that option in the merging software.
I then click ‘Go’ and inside one second the merging software creates a merged booklet with page numbers, a hyperlinked Table of Contents, and a bookmark tree. (If you want to watch this happening in real time watch this video I made…)
Technology #6 - Annotate
I want to keep a clean copy for eventual distribution to the prosecution team and to the judge. However, I also want an annotated copy for my own team (to clearly indicate the extracts from the documents).
I save the original booklet as the clean copy and make a second copy for me to annotate. Using my annotation software I can quickly go through the booklet and select and underline the extracts I think we should rely on. I put bright red underlining on the extracts so that the rest of my own team can find the extracts without looking too hard.
Technology #7 - Distribute
Having saved the annotated booklet, I can now send it out to the other members of the team. Depending on the size of the file, this can be done by e-mail or by sending a file-sharing link.
Conclusion - Simple or Complicated?
The above processes may sound complicated but I promise you they’re not! It’s just a matter of identifying the process you need to follow, identifying the tools you need, and knowing how to use them.
None of it is rocket science and if you’re smart enough to be a lawyer then you’re smart enough to learn how to do it.
Another way to look at it is to compare the work (and the volume of paper) that would be required if you don’t use technology to help. It’s a lot.
Hope that helps someone.
Great tips! I'm curious about the actual software you use for OCR, extraction, and annotation. I've used Acrobat Pro, but I kind of hate it. Have you found something better?