I need to extract a sizeable amount of data (> 1000 pages) from a Microsoft Content Management Server (MCMS) database for use in a Sitecore website.
I can see two main options:
-
Migrate the data into a new simplified database and display that
information in the new website. -
Convert the MCMS solution to SharePoint and use the SharePoint
connector module available for Sitecore to display this information.
I would prefer to go down the first route as there are no plans to use SharePoint to manage data/content in the future and would prefer to store this information in a simple SQL Server database to allow better searching.
I’ve looked at the database in question and think that the main tables I’d be interested in are Node, NodePlaceholder and NodePlaceholderContent but am struggling to find what I would expect. Can anyone out there give a bit of an explanation about the schema of this database for me? Or am I going to have problems trying to migrate the data in this way?
I’ve just recently been going through a similar process of exporting content pages out of MCMS 2002 (migrating to WordPress).
I’m not saying this is the 100% correct way to get the data but it worked for me.
Here’s the process I’ve taken to get page content out of the database.
As you’ve already seen the tables storing most of the data are
NodeandNodePlaceholderContent1.) To get an idea of what the
Nodetable holds you can view the contents organized by type2.) Pages (and Posts, will cover Posts further down) are type = 16…but to get just pages (and not posts) we need to filter by
IsShortcut = 03.) I only wanted published pages, so filter by
ApprovalStatus = 14.) Next, determine page created/modified by (with usernames)
5.) Next, figure out where in the hierarchy we are by using the
Node.ParentGUIDcolumnThis query let me know that pages are either in parent nodes named
FoldersorArchive Folder6.) Go up another level (get parent of parent)
The parent of parent is
Server(the root level) so now my conclusion is if the page’s parent is:Folders– then that’s an active pageArchive Folder– then that’s a previous revision of another pageI only want active pages so I’m going to JOIN on the
Foldersparent only7.) Now how about the markup. In our MCMS template there was only had one placeholder area. The
NodePlaceholdertable will identify the name of the placeholder(s) which is helpful if you have multiple placeholder areas in your template. I’m only going to join onNodePlaceholdercontentfor simplicity.8.) So at this point I got a little stuck on trying to determine where the page is in the system (ie. relative path or what channel does it live in), going back to step 1 & 2, type = 16 can be either a post or a page (which aren’t the same thing but they are related). So now we JOIN our pages to the post records to determine pathing.
After some google searches I stumbled upon this excerpt from Microsoft Content Management Server 2002: a complete guide really helped to get the rest of the way (and identified the
Node.Typeenums)9.) The final step now is to keep going up the post parent hierarchy resulting in several LEFT JOINS stepping up the ParentGUID chain. This query gives a visual representation of hierarchy using these LEFT JOINS.
As an aside, my task didn’t involve exporting the resource gallery content (images/docs/etc) but there should be enough information here to get a good start on that if you do require those pieces as well.
I hope this can be of some help to someone else migrating from MCMS 2002…