We are open for business! See updates on our operational status amid COVID-19. Learn more

Novel Methods of Guideline Reconstruction in Forensic Underwriting

Client Challenge: 

Often overlooked, digital file receipt is an integral and increasingly technologically sophisticated aspect of Mortgage Re-Underwriting Due Diligence. It serves as the gateway through which all documents must enter the platform, and therefore has a tremendous impact on workflow. Formats vary, however, and ad hoc development for non-standard productions is common. One such production received by Oakleaf Group consisted of several thousand pages, a web-based interface, embedded documents, and hyperlink dependencies. The complexity and size of this production, compounded by metadata anomalies, resulted in multiple issues:

  • First, the metadata included both the guideline, as well as the individual program guides shuffled together, receiving updates concurrently, with update metadata in shared columns.
  • Second, the document updates were piecemeal and asynchronous. An update to Credit policies on Monday, Income and asset requirements on Wednesday, and another Credit update on Friday is conceivable. To further complicate things, there was no singular update-document type index, leaving updates to ostensibly the same update-document non-standardized.
  • Third, would-be embedded documents had entirely different metadata notation to the update-documents, leaving their organization ambiguous.
  • Fourth, these updates included both Active and Draft versions -  a distinction not denoted in the metadata either.
Oakleaf Approach: 

We needed to reimagine the document as not simply a disaggregation of updates, but as a living document, singular and dynamic, subbing in and out sections as they were updated and became obsolete. This paradigm shift reoriented our development strategy towards compactness, linearity, and singularity.

The first step in realizing this new strategy required teasing out the two distinct documents (the guideline and program guides). By identifying substantively different structures in the naming conventions of the apparent Bookmarks, and checking the non-overlapping values on the rest of the fields, it became clear that the so called ‘Replica Id’ separated the metadata into distinct major documents.

With the two singular documents identified in the metadata, we then needed to construct an internal document index within each of the major documents. This involved a mix of splitting and combining numeric data indexes from apparent bookmark fields. This update-document index, verified by our subject matter experts, enabled our planned substitution scheme.

Next, the problem of floating document attachments was realized and resolved by identifying a grouping index, an apparent attachment identifier field. Using this, we pushed the constructed document index, along with the bookmark and date data from the source update-document down to the apparent attachments.  

Finally, we needed to identify draft versions and active versions of the documents, or else incorrectly replace an active version with an archived or draft version. With no apparent metadata to support this distinction, we turned to the files themselves. After manually identifying examples of both kinds of documents, we used their extracted text as keys to mine the text and classify each update-document. We then merged this constructed active/draft index with the original metadata to subset and purge the draft update-documents, along with their respective embedded dependencies.

With the metadata sufficiently enriched, we needed to turn this one static metadata document into thousands, an instruction set for every day in the range of updates. After applying a text overlay to the compiled document-updates, we compiled the whole-document guidelines, incorporating or substituting update-documents as they became active. Further, we included the apparent document index and document classifiers as bookmarks for ease in navigation for the user.


The results were true to form versions of the guideline and program guide for every day in the provided metadata date range, text searchable and bookmark-indexed as intended. The attached embedded documents lay in adjacent, bookmark based folders, active according to their master update-document. Essentially, creating a longitudinal view of the guideline and program guides over time.

From this amalgam of metadata, we constructed an archival format of the guideline, it’s program specific guideline components, as well as the subprime manual. It took a paradigm shift in our approach, accompanied by several technologically enabled tweaks, including text mining, and data aggregation/analysis, but the resulting reconstructed documents enabled the Mortgage Re-Underwriting Due Diligence team to provide superior work in an expedited time frame.