How to transition managed content when part of a company is sold
In my first post for the Better ECM blog I will discuss how to transition managed content when part of a company is sold.
When a company sells part of itself to another company there are many details that have to be attended to. One of the potentially most difficult of these is the transfer of documents. There can be thousands, hundreds of thousands or even millions of documents that may need to be transitioned from the old company to the new one. Having the documents in an ECM repository can be both a blessing and a curse. It can be a blessing because it should be easier to locate the content that needs to be handed over. It can be a curse because of the technical complexity of gathering up these documents. If the new company also wants the metadata for the documents then the challenge is even greater.
Traditionally this is handled one of two ways, either the in-house staff responsible for the care and feeding of the source ECM repository(s) is told to make it happen or an outside consultant is brought in to help. The in-house staff may or may not be equipped to handle such a challenge. In either case it is likely that the standard client and administrative applications for the ECM system(s) are not able to handle these types of requests.
Fortunately there is another option. Enter ECMG and Job Manager. With Job Manager you can set up one or more jobs to export all of the content needed. You can decide whether or not to include the metadata. You can even scrub the metadata if necessary to eliminate noise or sensitive information. Job Manager connects to many of the popular ECM systems and ECMG continues to create new connectors all the time.
Let’s walk through a typical scenario. The CEO announces to the company on Thursday afternoon June 28th, “We have spun off our downstream division to ABC marketing. The deal will officially close on October 1st. We need to provide all of the documents and data for ABC to be turned over by the closing date.” At this point IT starts trying to figure out how to isolate this scope of information and how to make it available. For the content that is in one or more of the company’s ECM system they will need to figure out which content is relevant and then figure out how to make it available. In addition, they will likely need to purge it from their system once the deal closes.
At a later date, delete the content
In our scenario we will assume that we are talking about millions of documents. In a case like this, the standard search capabilities of the typical ECM repository are not up to the task. By design they are intended to search through a massive amount of content and return a relatively small but relevant set of results. In this scenario we will need to get medieval. In other words, we will need to search using the underlying database if we want to manage such a large result set. All we really need in this result set though are the unique identifiers for the content. Now we may be able to identify it all with a single query but the more likely scenario is that we will need to us multiple queries against multiple sources or repositories to get back identifiers for everything we need. Once we have done this the real work will begin. Now, using Job Manager the simplest way to handle this is to bring them in using a text file with the Ids. There are also options for querying the database directly from Job Manager but in some cases the database may not be directly accessible from the machine Job Manager is running on. In this walk through we will assume we have a file with a list of ids.
The screen shot below shows Job Manager with a single fictitious job defined for exporting financial documents. This particular job has exactly 22 million documents broken up into batches of 100,000 documents each.
This job has already completed but let’s take a look at the configuration. We have set the max concurrent batches to five on this machine. That means that the job was running five batches at the same time. This lets the work complete many times faster than if the job was executed one batch at a time. We have also set the save mode to archive, which creates a single file containing all of the versions and metadata for each document upon export.
Let’s walk through the creation of a smaller export job. We will export some patent related documents. Actually these are just test documents from a FileNet VM image but you get the idea.
When you click on the new job button the job wizard opens to walk you through the process.
Enter a name for the job. We are going to call this one Export Patent Documents. A job description is optional.
Next we will select the operation we want to perform with this job. We select Export.
Next we will specify the batch size. Since I know this is a rather small job I will use a small batch size of 100.
Now we need to specify the source type. For this scenario we will use a list of ids from a text file. I browsed to the file called Patent documents.txt which contains one FileNet object id on each line.
Next we need to specify where the documents are currently managed. We selected a pre-existing connection to our FileNet P8 image. We show how to create the connection in other articles. So we won’t repeat it here.
Next we have the option of selecting a pre-existing transformation file. In this case we want the documents exactly as they are classified in the source repository. We will highlight transformations in another article.
After we click finish the new job is shown in Job Manager. Notice that there are no items for this job yet.
To actually populate this job we will click the create batches button. This will get all the ids for this job and build out the batches. After creating the batches we now have 547 items for this job.
To start this job we will click on the Execute button.
Job Manager will ask if you want to start all batches. Click on yes.
Job Manager is configured here to execute a maximum of five concurrent batches, so five of them are running at once.
Note that each of the first five batches are executing in parallel on this machine. Job Manager can also run additional batches for the same job at the same time on additional machines to scale up even further.
Now the job has completed. Notice that all but one of the documents succeeded.
If we click on the View Filed Item Summary button we can see the nature of the failure(s) for the job. In this case our single failure was the result of a zero byte file that was in FileNet. Although this was allowed in FileNet we flagged it here as a failure to alert the user that there is no actual content in this document.
Job Manager can also export the failure information to Excel to more easily share it with the business users.
If we look at the output folder in Windows we can see that there are indeed 546 files, just as Job Manager said there would be.
We can view the export files in the Cts Reader or we can look at them in any zip application such as WinZip.
Here in the Cts Reader we can easily view the classification of the document.
For a more detailed view we can also see the raw xml that was generated during the export process.
If we want to give this export to someone without the ECMG applications they can also see the contents of the export with WinZip. The xml with all the metadata is in the root of the zip file while the contents for each version are in various folders, starting with zero for the oldest version.
In a future post I will show how these files can be imported into another ECM repository.
Look here for more information on Job Manager.