SQL Server Instance Recovery

Recovering a SQL Server Instance

Options and Considerations for recovery with an Instance Failure

  • Has the database got any high availability features?  If so consider using these as first priority.  Replication/Log shipping failover may be the best option - minimal data loss and fast recovery.  Ideally have a DR server set up with all the user logins/SSIS/agent jobs/maintenance plans/mail notifications/servers links ready to avoid a complex system table restore. May require client change/dns cname change if different IP/hostname.
  • Consider repairing the database(s) if corruption has occurred.  This may be possible. Advantage would be minimal data loss, no client change.
  • The preferred method for recovering a complete instance would be to request a recovery of the server itself using Acronis.  This will provide us with a base to restore our user database backups onto and is less complex plus guaranteed same configuration and files compared to a full system database level recovery to a different server.  In a multi instance environment where only once instance needs recovery ensure the non effected database is backed up to a remote location first.
  • If Acronis not available and a full system database level recovery is required then consider that restoring all system/user databases to new server with same version build of sql server is mandatory.  The same host name, instance name and drives/paths is also helpful as will require less work. 
  • Possibly last resort restore to a different build / newer version.  Would only wish to do this in a controlled move/upgrade but users database can be restored back to newer version.  This could cause compatibility issues and user logins/SSIS/agent jobs/maintenance plans/mail notifications/server links would need to be manually set up.  Logins move using procedure if possible.

Rules and Limitations for Recovering An Instance

If we lose an entire sql server instance and we need to recover this to a different server then there a several options for restoring the database but we need to obey the following rules/limitations:
  • SQL SERVER backups are consistent at the point in time at the end of a backup.  Therefore if an instance has several backups in a database plan, the points in time will be different for each database. We cannot guarantee consistency between databases in SQL Server. 
  • If we require a backup restored to a point in time then roll each user database forward using transaction log backups to a point in time.  This still will not guarantee consistency between database backups.
  • If recovering from the database backups to a new instance then we need to ensure the version of SQL Server we are restoring to is exactly the same as the server the backup was taken from in order to be able to restore the system tables, this includes service pack versions.  Use select @@version to determine this.  This does not apply to the user databases and these can be restored to the same or a newer version of sql server.
  • If restoring a database from an Enterprise edition of SQL Server to a standard edition then need to consider that this may not be possible if Enterprise features are being used in that database.  These are
          Change Data Capture
          Transparent Data Encryption
          Data Compression
          Partitioning

Instance Recovery Notes

  • The master database contains server logins and system data,  msdb contains the agent scheduled alerts and jobs, history, operators, SSIS packages.  Model contains database templates which are the default options for when a new database is created.  Only ever restore these to a new instance as will overwrite any existing system databases.
  • SQL Server Maintenance plans MAY still need to be recreated after a full system and user database recovery. However the plan and job did come across during my test on 2008 R2.
  • If recovering to a different version of SQL server then system tables cannot be restored.  In this scenario we would have to manually copy logins using the copy logins guide and manually recreate any maintenance plans/jobs.  Jobs can be saved to a script and the script copied over to a new server and pasted into a newly created job as t-sql script which may save time if source is available.
  • When recovering to a new server check server is on the same domain as the server the backup was taken from.
  • Check configuration of where you are restoring databases to.  Check licensing/number of processers/disk space/memory. If Instance name/hostname is different to the original then client/application changes would be required to connect to the database in it's new location.
  • OS Permissions checks - ensure that required shared folders exist and are set up with the correct shares/privileges.  On a log shipped server primary there will be a folder containing the logs for shipping that will have access rights set up.  Use the log shipping guide to see what needs setting up at the OS level.Paths - try to keep folder locations of system dbs/logs/tempdb etc to the same drive and path as the original instance as this will require less work to restore. However the procedures are written to allow different paths/drives.

Instance Recovery Procedures

1. Start SQL Server in single user mode to enable us to recover the master database.  If engine will not start then master may need to be rebuilt from template before restoring. Please see: Starting sql server in single user / safe mode
   (Ensure the Agent is shut down)
2. Restore master from windows command line (typically in C:\Program Files\Microsoft SQL Server\<Instance Folder>\MSSQL\Binn):
sqlcmd -S <server>\<instance>,<port> -U <user> -P <password> for example : sqlcmd -S MDCMSSDBT1R\MDCMSSDBT1R_2,1443 -U sa -P password
    1> RESTORE DATABASE master FROM DISK = ‘<backup file>’ WITH REPLACE
    2> GO
The sql server engine should automatically shut itself down after master database restored.
NOTE: Master will be restored to the location of the master database specified in the advanced startup parameters for the sql server service (existing folder).
To change location then shutdown sql server, change path in startup parameter, physically move master files (checking sql service has access to folder)  then restart.
3. If the recovery server system data file paths are THE SAME as the original server then bring the SQLSERVER service back online in multi user mode (remove -m) but leave the agent and other sql services shut down. Go to step 5 and skip 4.
If system data file paths ARE DIFFERENT then go to step 4.
4. Start sql server in single user mode but add -T3608; (no recovery flag) after the -m; single user option (or use sqlservr -c -m -f -T3608 (Note:Cant get this to work so use method 1!)) or the service will not start due to the incorrect paths where it will look for it's system tables.
   From sqlcmd (using sa user details from backup as this is now restored) run:
   a. SELECT name, physical_name FROM master.sys.master_files WHERE Database_ID < 5 (this will list all datafiles for system tables with database_id < 5 filtering system data files only)
   b. For each data file listed, execute ALTER DATABASE <database> MODIFY FILE (NAME = <name>, FILENAME = '<filename>') where (database) is the database, (name) is the logical name from the list and (filename) is the path to the installed data folder with the file name from the list.
Remove -T3608 and -m from startup parameters and restart the sql server service (not agent)!
5. Drop user databases (they won't actually exist but are stored in master)
6. Restoring model and msdb can then be done through management studio or from sqlcmd with:
   1> RESTORE DATABASE model FROM DISK = '<backup file>' WITH REPLACE (model not necessary but can be useful)
   2> GO
   3> RESTORE DATABASE msdb FROM DISK = '<backup file>' WITH REPLACE
   4> GO
7. Restore user databases from backup
8. IF hostname is different on recovery server then we need to rename the server/instance name to reflect the new server name/recovered instance name.
   SELECT @@servername (to check names)
   EXEC sp_dropserver 'oldservername\oldinstancename'
   GO
   EXEC sp_addserver 'newservername\newinstancename','local'
   GO
   RESTART SQL SERVER for change to take effect
   SELECT @@servername (to check new names applied)
9. If the instance has been renamed using the above then the Sql Agent may fail to start with a connection error as it does not have permissions (old instance name in the sql users)
In MSSMS check NT SERVICE\MSSQL$<instancename> and NT SERVICE\SQLAgent$<instancename> are there and create new users for these    using the new instance name. Needs to be set up with windows authentication and sysadmin privileges - no mapping but default database master.
10. If paths are different on recovery server then you may see errors in the sql error log for subsystems not starting when the agent is started.  This is because it is looking for the library files in the original paths.  To update the subsystems run the follow script then restart the agent and verify the errors no longer appear.
   USE msdb
   GO
   DELETE FROM msdb.dbo.syssubsystems
   EXEC msdb.dbo.sp_verify_subsystems 1 
   GO
12.  Turn on the service broker on MSDB in database options.  Shut down the sql server agent first as it requires exclusive access to turn this on.  If this is not done then you will be warned the broker is off when checking mail profiles.  By default when a restore of a database is done the broker will be switched off.
13.  Check mail profile in SQL Server Agent properties is enabled.  I had to enable this.
14. Change mail profile server name (from address) details if this has changed.
DONE!

Post Recovery Checks

  • Check sql error logs to ensure there are no errors
  • Check all jobs / maintenance plans are accessible and test where possible.  Names may also need to be changed for tidyness where they reference the original servername/instance name.
  • Check the datafiles/logs are in the correct locations
  • Sent a test notification email to ensure email notification working.
  • Check backup jobs will not overwrite any existing backups required by a live system if doing this is a test environment.  I had to disable the backup jobs as I was restoring a live db to different hardware.
  • Consider requesting a new acronis / snapshot backup of system immediately after post recover.

Comments

Popular posts from this blog

SQL SERVER – Event ID 107- Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database.

SQL Server Builds Information

Using DBCA silent install and disabling automatic memory management