Direct Document Capture in Nuxeo Using Ephesoft and CMIS
Our partner, Nuxeo, blogs about how document capture can be a critically important part of any system that uses a content repository. They also mention how another partner of Netlocityâs, Ephesoft, offers an excellent solution for document capture, data extraction, and built-in support for exporting documents via CMIS.Â
Ephesoft (in this case the Enterprise Edition) offers âintelligent document captureâ. It provides the ability to scan physical and electronic documents, automatically process them for arbitrary content (ICR, OCR, images, etc.), and export/report on the results.
Content Management Interoperability Services (CMIS) is an open standard that allows different content repositories to interoperate. Specifically, CMIS defines an abstraction layer for document management using web protocols.
To be clear, the support for CMIS in Ephesoft means that it can export content to anyrepository that properly supports CMIS with no special tooling or integration. Nuxeo happens to have excellent CMIS support so this kind of integration is really easy.
If you would like to find out more about connecting content management applications using CMIS, watch our webinar where CMIS visionary Jeff Potts and I discuss CMIS and its value.
Hereâs a short rundown on how to capture documents directly in Nuxeo using Ephesoft and CMIS. Note that in this example Ephesoft is running on Windows so any file paths are in Windows format.
You can find a complete tutorial to setting up document capture here:
http://www.ephesoft.com/wiki/index.php?title=Tutorial
I will just summarize the basic steps and follow with some helpful tips:
Create a âbatch classâ.
Define a âdocument typeâ.
Define the âindex fieldsâ.
Define the âkey value extractionâ for those fields.
Tip: I recommend using the Advanced Extraction in most cases, as opposed to the Key-Value Extraction, because itâs more explicit and intuitive. Â With Advanced Extraction you visually define the explicit capture area with the label (green) and the field (red) like so:
Hereâs a helpful video about how to setup Advanced Extraction:
http://wiki.ephesoft.com/advanced-key-value-extraction
Tip: Ephesoft uses âconfidenceâ scores to determine if a document matches a particular Document Type, and if the fields match or not. Â If something does not match, user intervention is generally required. Â Confidence scores are not a percentage but an index that is capped at 40. For basic testing and development itâs perfectly acceptable to set the confidence score to 0 to avoid any human intervention.
Tip: If youâre accessing the Ephesoft UI from somewhere other than the host you may find that document images to not show up. To fix this you need to modify the file âC:\Ephesoft\Application\WEB-INF\classes\META-INF\dcma-batch\dcma-batch.propertiesâ. Set the property âbatch.base_http_urlâ to match the IP address or hostname of the Ephesoft server.
You need to create a folderish document into which Ephesoft will export the documents.
You may need to create a new document type in Nuxeo to support the information coming from Ephesoft. This depends on whether or not you want to reuse an existing document type â in this case beware of any events/automation for that document type â or create a new one to decouple the documents coming from Ephesoft from any existing content. In the latter case this gives you complete control over what happens after the documents arrive, without affecting any existing business logic.
For security reasons you may want to create a user specifically for Ephesoft to use, with appropriate permissions so the user doesnât have full access to the whole repository.
To integrate Nuxeo and Ephesoft via CMIS you only need to complete two steps:
Configure the CMIS plug-in.
Configure the field mapping.
Use the Ephesoft Admin Client to perform these steps.
From the âBatch Class Managementâ tab, open your batch class.
Select the âModuleâ tab.
Double-click âCMIS-Exportâ.
Then click the Edit button to make the necessary changes. Here is an example:
Configure the following options:
Cmis Root Folder Name â this is the folder you created in Nuxeo to receive the documents. The path should be relative to the repository name.
Cmis Upload File Extension â can be âpdfâ or âtiffâ.
Cmis Server URL â Use the format âhttp://server:port/nuxeo/atom/cmisâ.
Cmis Server User Name â Nuxeo username that has write access to the âCmis Root Folder Nameâ.
Cmis Server User Password â password for the Nuxeo user.
Cmis Server Repository Id â the name of the Nuxeo repository, usually âdefaultâ.
Cmis Server Switch ON/OFF â make sure this is set to âONâ.
Do not enter a leading slash.
Do not enter a trailing slash.
Click âOKâ the save the edit, and be sure to click âApplyâ to permanently commit the changes. Finally click âValidateâ and then âDeploy Workflowâ any time you make plug-in changes.
Locate the file âC:\Ephesoft\SharedFolders\BC4\cmis-plugin-mapping\DLF-Attribute-mapping.propertiesâ. Here you must define the mapping between your Ephesoft document type and the corresponding Nuxeo document type. Ephesoft values are on the left, Nuxeo on the right.
When you configured your batch class, you defined a folder where Ephesoft will expect to find documents to import (the âUNC Folderâ property). Drop a PDF or TIFF in this folder and Ephesoft will work its magic. After a few minutes youâll end up with a document in Nuxeo at the path you configured. Easy peasy!
Tip: If something doesnât work, open the âBatch Instance Managementâ tab in the Ephesoft Admin client, locate the failing batch, click the â>>â button and then the âTroubleshootâ button.
This allows you to download a copy of all the logs and involved documents for that batch. Generally the Application Log contains the most useful information.
Tip: A failing batch can be restarted using the âRestartâ button; this restarts the failing step, not the entire batch! If the CMIS export isnât working, you can easily make changes and retry just the export.
Read more Nuxeo blogs here!
Have a question? Contact Netlocity!