Intro
This is just the back story for this incident and the Exchange Server environment setup. Skip to The issue section for the actual stuff
I recently had a project started to migrate an Exchange Server 2010 organisation to G Suite. The customer had 4 (very old) servers, 2 DAG and 2 CAS. While we were preparing for the data migration phase we hit a very hard brick wall. 3 hard disks on one of the servers decided it was time for them to smoke. Luckily they had 6 disks on that server, and they were using RAID5. Few days after replacing these drives, another different drive on the same server followed its fallen brothers!
So they managed to keep their servers up and were quick to replace the damaged disks at the end. However this was not the case for the databases on the server. Once of the databases got out of sync because it was actually the largest database and all the mailboxes were on it. Unaware of this we actually (or I to be precise) started to work and prepare the environment and scripts to export mailboxes to PST. We got the storage unit that will receive the exported files ready. After I added the required roles to the user account and got my scripts ready and it was time to execute the scripts. We were working with about 400 users so I expected the process to take longer than expected which was fine.. I was expecting this and I warned the customer as well about this.
What actually happened was nothing we liked at all!. The list of export requests just stuck on “Queued” for 4 days and nothing of them either failed or changed status… We checked the Exchange Server for any errors or services that are down but we found nothing. So the search for the problem begun!
The issue
When we executed the command New-MailboxExportRequest we get no error. Then when we execute Get-MailboxExportRequest we actually see the request Queued. However this does not change at all. We checked the Event Viewer on the Exchange Server from where we ran the command and found out that the command executed successfully. We checked later on the other servers for any errors in the Event Viewer but found none.
What did we do to fix it?
Upon some research on TechNet and other sites, we found out similar cases for ours, with slight differences in the environment. There were many suggested fixes for this issue, and it turned out that this problem has many causes… From what I found out, this issue appears to be related to database replication, or CAS servers. A lot of talk about the CAS scenarios, but very few mentioned the replication problems and issues that might occur.
We put a list of possible fixes to try out as follows:
- Restart all the DAG nodes – did not work
- Check CASMailbox features for all recipients we intended to export – all features enabled, not the case of this issue
- Run Get-MailboxDatabase | Clean-MailboxDatabase – ran the cmdlets, was not related
- Make sure the attribute ‘MAPIBlockOutlookNonCachedMode‘ is set to $false in user mailbox – it was set to $false for all users, not relevant
- Un-hide the mailboxes from GAL – no mailbox was hidden, all visible, not relevant
So this list did not take us anywhere, and I also did some extra stuff such as:
- Set the RPCClientAccessServer property in the databases to one of the CAS servers – did not fix anything
- Trying to move one mailbox to another database to check if error from database – move request did not work, got same issue (stuck on Queued)
But so far we did not get to see the status of the export requests changed to anything else…
The fix!
At the end it turned out to be related to the problem that we never thought of. That server which its hard disks smoke earlier was one of the DAGs :-D. And with the hard disks went also the logs partition on that server. When it was restored the largest database went out of sync and the replication failed. The copy queue length was increasing like a fast counter, and the status was stuck on synchronising. Obviously we needed to reseed the failed database copy, however looking at the status of the hardware of the servers and the time that we have already lost. We wanted to take another approach.
The proper way to fix this
First if you wanted to follow the proper way, here are the instructions to reseed the database copy:
- Suspend the database copy first (either from EMC or EMS)
- Use this cmdlet to reseed the database:
Update-MailboxDatabaseCopy -Identity <MAILBOX_DBNAME><SVR_NAME> -DeleteExistingFiles
Once the update is complete and the replication is complete also, you can re-enter the New-MailboxExportRequest cmdlet and things should be working fine.
That’s it!
The not-very-proper way to fix this
Well, we needed to get the data out fast, and we knew the servers are struggling and in their final days. So we changed one attribute in the mailbox database that allowed us to make the export without looking at the status of the other mailbox database copy.
In a DAG environment, the passive database and the log shipping are factors that will affect how mailbox export and move requests are handled. If you have passive database that is out of sync. Or the logs are lagging behind, then the move or export request will not be started. This behaviour is defined in this attribute ‘DataMoveReplicationConstraint‘ all I had to do was to set it to None, which means it will not take into consideration any database copy or log status. This makes it similar to a standalone environment (with no DAG). Thus making the export request start.
Get-MailboxDatabase | Set-MailboxDatabase -DataMoveReplicationConstraint None
The goal of having this property and this behaviour in Exchange Server is to reduce the data loss when doing fail-over so you will come out with the least amount of data lost.
You can find more information on the ‘DataMoveReplicationConstraint‘ property here.
Once I set the value to None, the export started again and we were able to export all the data we needed from the server.
Disclaimer: The Microsoft Exchange Server 2010 logo is a property for Microsoft which I have no affiliation with nor own it. I only used it in this site for demonstration purposes only.
Checkout my other blog posts here.
No responses yet