General Checklist 
This checklist covers questions from Important Questions About Your Setup, with cross-references to other sections in this documentation. A major problem is to define reasonable thresholds to trigger specific recommendations because there are so many complex and interrelated dependencies to consider. Therefore, many of the formulations are general and need to be adapted to your particular installation.
Note
Risk analysis with checklists
Before you start to implement high availability into your systems, we strongly advise you to try and quantify which of the items listed in this general checklist and in the potential single point of failure (SPOF) checklist are relevant for you. First you need to rank the vulnerable aspects of your system according to the costs of failure (for example, whether you need to call in an external engineer, order a replacement part, and so on). Then you can start improving availability.
You consider how much system uptime your business requires:
5 days a week / 12 hours a day
Sufficient offline time to do system maintenance and offline database backups during operational days
Database size might require online database backups during operational days or partial offline backups
Upgrade (AS ABAP), system maintenance, offline database backups can be done during nonoperational days (for example, at weekends)
For information on database backups, see (listed in alphabetical order):
5 days a week / 24 hours a day
Online database backups are required during operational days
Perform Upgrade (AS ABAP) and general system maintenance during nonoperational days
Offline database backup has to be done during off-days
7 days a week / 12 hours a day
Sufficient nonoperational time available during each day to do system maintenance and offline database backups
Database size might require online database backups
Scheduling SAP Upgrade (AS ABAP) might become an issue
7 days a week / 24 hours a day
Special time slots have to be defined to do upgrades, both Upgrade (AS ABAP) and database software upgrades. For more information about database upgrades, see (listed in alphabetical order):
Database backups have to be online. For more information, see (listed in alphabetical order):
Redundant hardware components are worth considering such as Switchover Software, and Uninterruptible Power Supply (UPS).
Disaster Recovery is also worth considering
You consider how much system downtime your business can tolerate until there is only a minor business effect:
Several hours or one business day
Most likely no special measures necessary for protection against hardware or operating system failure
Standard recovery procedures for databases are most likely sufficient
Less than above
Redundant hardware components might become necessary, such as switchover software improved disk technology, network high availability, and Uninterruptible Power Supply (UPS).
Database backup frequency, restore, and recovery times have to be evaluated.
If restore takes too long, backup devices need to be replaced with faster ones or more devices added. Alternatively, you might need to increase the backup frequency. For more information, see (listed in alphabetical order):
Backup, Recovery, and Upgrade with DB2 for LUW
If data volume is simply too large to finish restore/recovery in an acceptable period, you need to evaluate your disk technology. You can employ, for example, mirrored disks to avoid restore and recovery altogether (except in the event of multiple simultaneous failure).
You consider how much system downtime your business can tolerate until there is a major business effect (for example, loss of business):
If your business can only tolerate a few hours downtime:
You must make sure that a restore or recovery can be completed in the time available. For more information, see (listed in alphabetical order):
If this cannot be guaranteed, your disk technology has to be looked at (see next point).
Redundancy for hardware components becomes important, so evaluate the use of:
Special disk technology (for example, consider disk mirroring, RAID, or LVM). Disks are among the most vulnerable of all hardware components so it makes sense to start with them.
Redundant network components.
Cluster CPUs with switchover solutions to protect the database server and/or the central application server. See Switchover Software for High Availability.
Uninterruptible Power Supply (UPS) is cheap and worth considering.
If your database is Oracle or DB2 for z/OS, an alternative to switchover software is to use Replicated Database Servers together with the DB reconnect feature. For more information on DB reconnect, see DB Reconnect (AS ABAP) and DB Reconnect (AS Java).
More than one node should be available as application server. You also need to prepare at least two nodes to act as central application server after a manual reconfiguration. This means that, if the node where the central application server is running becomes unavailable, another node should be prepared to start the central application server.
You consider how much system downtime your business can tolerate until it faces collapse:
Evaluate your system for potential single points of failure. Even if the system is 90% equipped for high availability (for example, mirrored disks, redundant network components, and so on), this is worthless if one of the components in the unprotected remaining 10% fails.
If replacing critical hardware components or reconfiguring critical software components is likely to take so much time that your entire business is at risk, you need to seriously consider Disaster Recovery using a backup site.
You consider whether your business has periods with special availability requirements.
If there are critical periods of an application that cannot be interrupted at all or where an interruption can be tolerated for less than a couple of minutes only, you might need to take extra precautions. Consider the following:
Disk technology (for example, consider disk mirroring, RAID or LVM)
Network components
Switchover Software or the use of Replicated Database Servers together with the DB reconnect feature (only available for certain databases). For more information on DB reconnect, see DB Reconnect (AS ABAP) and DB Reconnect (AS Java).
Uninterruptible Power Supply (UPS) or redundant power suppliers
You consider factors concerning your installation:
Age of system
The approach you need to take depends on whether a new hardware system is being installed with the SAP system or whether it is to be installed with existing hardware:
If a new system is being set up, you should evaluate the high availability requirements at an early stage, and design the new system accordingly.
If the SAP system is being installed on an existing system, you need to investigate the system for weak points.
Evaluate installations options
You evaluate the installation options for application servers and the database server. Depending on the desired installation and high availability requirements, you need to carefully consider the mapping of SAP system services since this might not be a straightforward task.
Expected Data Volume
The database setup needs to be carefully planned:
Proper planning of database layout makes space management a lot easier. For more information, see (listed in alphabetical order):
Space Management with DB2 for i
Large databases (> 100 GB) might already cause problems when it comes to a backup. The time spent for restore and recovery might well be too long. You need to evaluate your disk technology since mirrored disks give additional options for backups (see RAID and LVM). Standard backup, restore, and recovery procedures might take too much time.
For more information about backup and recovery of databases, see Database High Availability.
Expected Transaction Load
The transaction load influences the installation options for application servers and the database servers.
You consider your internal resources.
Budget available to finance improvements.
Most companies need to make improvements in the following order, according to financial constraints
Disk technology
Uninterruptible Power Supply (UPS)
Switchover Software for database and central application hosts
Availability of qualified personnel
The level of qualified personnel available to monitor the system during “normal operation” hours might influence the level of redundancy you choose:
Qualified personnel not always available
Certain technologies, such as disk technology, switchover solutions, and redundant network components, reduce the need to have personnel available to handle errors.
Qualified personnel always available
You can rely on your personnel to handle errors.
Be sure to have the appropriate processes in place to get your staff involved in the event of an error (for example, hotline, on-call, standby).
You consider your external resources.
Support contracts
The level of support contracts in place might influence your approach to high availability:
“Special” maintenance contracts in place
Do you have such contracts with hardware and software vendors such as a guaranteed replacement of faulty hardware components within 24 hours? If so, you might choose not to have a disaster recovery site or to implement less comprehensive redundant hardware, such as disk technology, networks, and so on.
“Standard” maintenance contracts in place
If your maintenance contracts are only standard, you might choose to have a higher level of availability to cover gaps in maintenance. Then you might choose to set up a disaster recovery site and to implement more comprehensive redundant hardware, such as disk technology, networks, and so on.
Access to your system for remote support and maintenance
Before implementation, consider using the GoingLive service. For proactive and highly qualified SAP system administration, you can consider using the EarlyWatch service to avoid problems arising. SAP provides both services.
For more information about GoingLive and EarlyWatch, see SAP Safeguarding.
You consider environmental or other factors.
Examples of these factors are:
Unstable power supply in your area
Consider using Uninterruptible Power Supply (UPS) or redundant power suppliers.
Likelihood of disaster such as earthquake
Consider using a disaster recovery site.
High temperature
Consider using a reliable air-conditioning facility.
Switch from summer to winter time
Consider using the DST safe kernel