Identifying the Problem Requiring a Restore (SAP Library - SAP Database Guide: Informix (BC-DB-INF-DBA))

Identifying the Problem Requiring a Restore

Use

Before you start doing a restore with the Informix database, you need to make sure that you have identified the problem and are sure what kind of failure has occurred.

Procedure

Answer the following questions:

Is the database server up or has the failure caused it to go down?

To find out whether the database server is up or down, you can enter the following from the command line:

$ onstat -

If the response is not similar to the following example, then the database server might have failed:

INFORMIX-OnLine Version 7.20.UC3 --On-Line-- Up 6 days 20:57:16 -- 80298 Kbytes

To make sure that the database server has not terminated normally, see "What is in the message-log file?" below.

If the database server is blocked, the reason is sometimes given in an extra information line, as in the following example:

Blocked: Media_failure

If the database server is down due to a problem, this generally means that a failure has occurred in a "critical" dbspace ( logdbs , physdbs or rootdbs ). Whether the database server is up or down influences what kind of restore you need.

Which dbspaces have failed?

You need to identify which dbspaces have failed:

· Critical dbspaces ( logdbs , physdbs , rootdbs )

If a critical dbspace has failed, then the database server goes down.

· Non-critical dbspaces (all remaining dbspaces)

If only a non-critical has failed, then the database server might still be up.

The type of dbspace that has failed determines what kind of restore you need to do. If the database server is still up, identify which dbspaces have failed by using the following procedure:

Enter the following command from the command line:

$ onstat -d

Read the second section of output from this command, as in the following example (only part of the output is shown):

Chunks
address	chk/dbs	offset	size	free	bpages	flags	pathname
c34ac178	1 1	8	75000	20203		PO-	/.../physdev1/data1
c34ac398	2 2	8	175106	53		PO-	/.../physdev2/data3
c34ac470	3 3	75008	435000	7		PD-	/.../physdev1/data1
c34ac548	4 4	175114	205000	3		PO-	/.../physdev2/data3
c34ac620	5 5	8	10106	53		PO-	/.../physdev1/data2
c34ac6f8	6 6	8	50000	49939		PO-	/.../physdev2/data4
c34ac7d0	7 7	10114	350000	74345		PO-	/.../physdev1/data2
c34ac8a8	8 8	50008	10000	1295		PO-	/.../physdev2/data4
c34ac980	9 4	360114	150000	265		PO-	/.../physdev1/data2
...

Check "flags" for a value of "D" (that is, "down") in the second position, and then read across to find the value in "chk/dbs" (that is, "chunk/dbspace"). In this example, chunk 3 – belonging to dbspace number 3 – is down.

Read the first section of the output to find the name of the affected dbspace, as in the following sample of output from this example:

dbspaces
address	number	flags	fchunks	nchunks	flags	owner	name
c34ac108	1	1	1	1	N	informix	rootdbs
c34ad2c8	2	1	2	1	N	informix	logdbs
c34ad338	3	1	3	2	N	informix	psapes30e
...

Look for the dbspace number 3 and read across to find the name of the dbspace. In this example, the affected dbspace is psapes30e , a non-critical dbspace.

What is in the message file?

Look in this file to find if there is any clue to what has happened. The database server keeps a processing audit trail in this file. The file also tells you if the database server has terminated normally, as in the following example:

16:46:07 INFORMIX-OnLine Stopped.

In this case, you should be able to see the checkpoint information before this message, indicating that the data on disk is consistent. If so, a restore is not necessary.

You can use SAPDBA to look at the message file. Refer to Listing System Information with SAPDBA.

How can I find out exactly what went wrong?

In most cases, you can use the answers to the previous questions to identify what has happened.

If you require extra confirmation and can afford to spend more time investigating, you can execute the command oncheck to obtain a comprehensive picture of the disk structure. For more information on oncheck , see the Informix documentation. Depending on which parameters you use, oncheck might take up to several hours to complete.

Did the fault occur after a particular point in time?

You might – after examining the message-log file (see "What is in the message-log file?" above) – have found that the error that caused the failure occurred at a particular time. You can then do a "Point-in-Time" (PIT) restore (only available if you use ON-Archive or ON-Bar for data recovery and are doing a cold restore). A PIT restore avoids using corrupted or faulty data from after this point. See Performing Logical Restore for Full-System Cold Restore (ON-Archive) or Performing Logical Restore for a Full-System Cold Restore (ON-Bar) for more information on PIT restores.

Result

Now that you have identified the problem, you need to decide what kind of restore is the most appropriate for the situation. If you suspect that the fault lies in the database server rather than in the database data, you need to find a solution to the problem because a restart might not be possible or failure might recur soon after the restart. Contact the Informix hotline in this case.

See also:

Informix documentation