Friday, July 03, 2009

VSS_E_UNEXPECTED_PROVIDER_ERROR with SnapManager for Exchange

I have a client with a NetApp StoreVault S550 system. After a series of issues with SnapManager for Exchange not working with the latest .Net service packs, we were able to have reliable ongoing snapshots for Exchange.

However, just this week, I started to get the following notifications that an error was occuring:

From SME:

Backup SG [First Storage Group] Error: VSS API Error Description: VSS_E_UNEXPECTED_PROVIDER_ERROR

At the S550 console:

Fri Jul 3 21:18:01 CST [app.log.err:error]: EXCHSVR: SME Version 4.0: (111) Backup: SnapManager for Exchange online backup failed. (Exchange Version 6.5 (Build 7638.2: Service Pack 2)) Error code: 0x8004230f, VSS API Error Description: VSS_E_UNEXPECTED_PROVIDER_ERROR

Basically the error is occuring because the volume limit for snapshots had been reached (max 255). I was curious as to why this would be occurring as old snapshots should have been deleted. Checking the status from the console showed however that autodelete was disabled.

S550*> snap autodelete EXCHSRVRexchange_log
showsnapshot autodelete settings for EXCHSRVRexchange_log:
state : off
commitment : try
trigger : volume
target_free_space : 20%
delete_order : oldest_first
defer_delete : user_created
prefix : (not specified)

The following command enables autodelete:

snap autodelete EXCHSRVRexchange_log on

Settings now:

S550*> snap autodelete EXCHSRVRexchange_log
showsnapshot autodelete settings for EXCHSRVRexchange_log:
state : on
commitment : try
trigger : volume
target_free_space : 20%
delete_order : oldest_first
defer_delete : user_created
prefix : (not specified)

The console then shows a series of autodelete tasks running to clean up the 4 months of excess snapshots for this volume. However, here is the kicker - if you run the snapshot process again, autodelete actually deletes the latest snapshot just created.

So turn autodelete to off once more and verify operation - it turns out that on the same volume, the was a SQL LUN with SM for SQL also having an issue. As a result, there was a stack of snapshots being taken with no actual data in them - with SM for SQL not finishing a backup job, in a very rapid fashion the volume hit the 255 snapshot limit and now SME wouldn't work, let alone the SQL SM. The SQL issue was a database that refused to index properly and crashing SnapManager for SQL. With that fixed, SM for SQL started to work also.

Seems operational for now - will need to keep an eye on things to see if all is good. Remember - leave autodelete off!

1 Comments:

At 8:35 pm , Anonymous Anonymous said...

Thanks for this mate, helped me out a lot!!!

Mark

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home