OverviewWhen system problems occur companies often automatically send the dumps to IBM, without looking at them first. However, as we all know, z/OS today is almost invariably a multi-vendor software environment.<br>This course teaches you the vital (yet simple) techniques that will quickly glean maximum information from the diagnostic data. This lets you identify the vendor product responsible for the problem, and ultimately verify their diagnosis of the problem.<br><br>This course describes and explains what can go wrong in an IBM z Systems environment, and what you can do about it as an operator or systems programmer. It looks at failure situations from many points of view, including hardware problems and the software environment.<br><br>The software environment is further examined by looking at the Recovery Termination Manager (RTM) - the 'cleaning-up' function of z/OS - and its ABEND-concept. All the different reports that come out of a z/OS system in conjunction with failures (messages, dumps, traces, etc.) are also discussed. The most common reasons for system ABENDs (and how you can analyze the information coming out of the system when they occur) are also covered.<br><br>The course is also available for one-company, on-site presentations and for live presentation over the Internet, via the Virtual Classroom Environment service.
PrerequisitesTo benefit from this course, participants need both the ability to read Assembler code and familiarity with z/OS internal operations and data areas (including the concept of control block chaining). These prerequisites can be met by completing the RSM courses Using z/OS Assembler, z/OS System Fundamentals Part 1 and z/OS System Fundamentals Part 2.
Delegates will learn how to
- identify which software component caused the problem
- identify the vendor responsible for the problem
- glean the most relebvant diagnostic information in the minimum time
- use the appropriate diagnostic procedure for each type of dump
- identify the failing operating system component in standalone and SVC dumps
- use various operating system data-gathering facilities such as system traces, LOGREC, and SLIP
- locate information in various manuals that is critical to problem resolution.
- develop a methodology to speedily extract the required information for resolving a problem situation.
OutlineInteractive Problem Control System
Control block/data area; Information sources; Control block header; Control block data area map; Cross reference table; Fields and subfields; Field redefinitions; Control block chaining; Finding control blocks; The Prefix Area (PSA); The new Prefix Area (PSA); Dump types; IPCS introduction: what is IPCS?, What makes up IPCS?; Getting started with IPCS - Primary Option Menu; Default values selection; Primary Option Menu; Data entry panel; Pointer stack panel; Getting around in IPCS browse; IPCS subcommand entry panel; IPCS command output display; IPCS LIST command; Indirect addressing; Displaying Control Blocks; Creating SYMBOLS: Dump Directory; Additional Useful Commands; Dump analysis panel; Component Data Analysis Panel; STATUS; Analysis commands; Dump Management panel.Recovery & Termination
MVS's recovery management; RMS; What does RTM do?; Interrupt types; Anatomy of an Interrupt; RTM - The Big Picture; How is RTM invoked?; Normal termination; Abnormal termination - problem types; Program check; Software 'Abend'; Abnormal termination - recovery; Recovery routines; RTM status information; ESPIE environment; ESPIE processing; ESTAE recovery routines; ESTAE environment; STAE Control Blocks (SCB); ESTAE processing; Percolation; Functional Recovery Routines; FRR environment; FRR stacks; RTM2WA; SDWA; Variable Recording Area; Interpreting the SDWA; Interpreting the Variable Recording Area; Logrec detail reports.Request Block Analysis
Address space structures; RB loss of control; Linkage stacks; RB analysis procedure; Linkage Stack analysis; General analysis; RB analysis.System Trace
Starting the System Trace; Formatting the Trace; Sequence of events; Interpreting Traces; System Trace tips.SVC Dump Analysis Approach
Generating SVC dumps; Dump Analysis and Elimination; Types of SVC Dump; Problem resolution overview; Dump TITLE; SDWA; History; RTM2WA; Other dump types.Multi Processor Environments
Tightly coupled processing; Prefixing; Processor coexistence; Processor STATUS; Work In Progress; Interrupt information.Locks
The problem; An example of what can go wrong; Serialization via LOCKS; Lock varieties; Locking Hierarchy; Locking Mechanics (SPIN); Spin Loop Identification; Spin Lock Holder; Local/CML Locks; Locking Mechanics; Global Suspend Locks ANALYZE; Locks Held; Locking Mechanics (CPU LOCK); SPIN lock summary; SUSPEND lock summary; ANALYZE.Dispatcher
What does it mean to be dispatched?; Where does the dispatcher run?; Dispatchable units of work; Who calls the Dispatcher?; Special exits; Service Request Block routines; Service Request Block (SRB); SRB example - IOS post; Service Request Block (SRB); Suspended Service Request Block (SSRB); SRB priorities; SRB scheduling with IEAMSCHD; SRB enclaves; Dispatcher queues; Scheduling service requests; Address spaces; ASCB/ASXB contents; Finding work within an address space - tasks; TCB contents; TCB chaining; Address space task structure; Serialization with Intersect; Dispatcher indicators; Global problem determination; Global indicators - SRB queues.Address Space Control
Cross Memory Services; XMS instructions; PC & PT/PR; XMS authorisation; Primary, Secondary & Home modes; Access Register mode; SSAR; Access Register Translation (ART); Access lists; ALETs.SAD Analysis Approach
Big picture; Dump environments; When should A SADUMP be taken?; Pre SADUMP considerations; Taking a Standalone Dump; Stand Alone Dump analysis path selection; Disabled Wait analysis path; Enabled Wait Analysis path; Enabled Running Analysis path; Disabled Running Analysis path.Input/Output Supervisor
IOS drivers; Performing I/O; I/O flow; IOS analysis - high level; Active I/O analysis; IOS failure analysis.Real Storage Manager
Types of storage; Dynamic Address Translation; Identifying The STD; Managing real storage; RSM high level check; Detailed analysis - high fixed page utilisation; Detailed analysis - other problems; History - Component Trace.Auxiliary Storage Manager
Paging a frame to a slot; ASM high level check; Detailed analysis: what is the problem?, who is affected?