This course describes and explains what can go wrong in an IBM z Systems environment, and what you can do about it as an operator or systems programmer. It looks at failure situations from many points of view: the physical computer rooms, hardware problems and the software environment.<br>The software environment is further examined by looking at the Recovery Termination Manager (RTM) - the 'cleaning-up' function of z/OS - and its ABEND-concept.<br>All the different reports that come out of a z/OS system in conjunction with failures (messages, dumps, traces, etc.) are also discussed. The most common reasons for system ABENDs (and how you can analyze the information coming out of the system when they occur) are also covered.<br><br>This course is also available for one-company, on-site presentations and for live presentation over the Internet, via the Virtual Classroom Environment service.
Read more


An understanding of z/OS generally and of operational concepts in a z/OS system in particular - as taught in the course z/OS & JES2 Operations.
Read more

Delegates will learn how to

  • explain the background of error messages
  • diagnose operational central system problems
  • suggest solutions to operational problems
  • describe the diagnosis tools
  • report problems and communicate with applications personnel and systems programmers
  • understand what MVSs Recovery Termination Manager (RTM) does when programs fail
  • understand the concept of an ABEND
  • analyze ABEND situations
  • resolve ABEND situations.
Read more


What is an Operational z/OS problem?

The z/OS mainframe - a large system; What can go wrong?: the operational view, the application view; the "hole in the ground"; loss of electric power: mains supply, power inside the system, preparing for power failures; hardware problems - total loss of critical components; critical system software failure; MVS or DFP problems; VTAM or TCPIP problems; TSO problems; problems in Database or Transaction Management systems; partial loss of hardware; CPUs; I/O channel paths; disk subsystems and DASD volumes; sections of the network; partial loss of system software;; JES2 problems; JES3 problems; SMF problems; switching GLOBAL JES3; preparing for DSI; performing the DSI; ACTIVE - NOACTIVE; LASTDS; NOBUFFS; application systems failures; performance degradation; non-operational components; badly tuned system; humans or hardware?; actions: Act "real time" to attempt recovery' analyze afterwards; summary; review questions.

The Hardware - CPU and Storage

A mainframe installation - a lot of hardware; the hardware components; Central Processing Unit (CPU): real (central) storage; expanded storage; Channel Subsystem (CSS) and peripheral devices; Virtual Storage; CPU modes; controlling the modes - PSW; PSW control bits; where do you find the PSW?; the real thing in each CPU; partitioning creates multiple logical CPUs; copies saved by software; Disabled Wait; MVS has decided to stop the system; an incorrect PSW: was accidentally loaded, was deliberately loaded; Enabled Wait; Enabled Loop; Disabled Loop; review questions.

The Hardware - Input/Output Processing

I/O devices; Control Units; I/O processing in principle; Defining the I/O Configuration; the Hardware System Area (HSA); the MVS configuration; Hardware Configuration Definition (HCD); the I/O users in MVS; review questions.

Hardware Errors & Recovery

What is System Recovery?; hardware error types; soft errors; hard errors; terminating errors; Machine Check processing and MCIC; masking MC interrupts; external damage code; hardware error areas; CPU errors; storage errors; Channel Subsystem errors (I/O errors); soft CPU errors; System Recovery (SR); Degradation (DG); soft CPU error reporting; hard CPU errors; System Damage (SD); instruction Processing Damage (PD); Information in PSW or Registers are valid (IV); Timing Facility Damage; the effect of hard CPU errors; terminating CPU errors; processing terminating CPU errors; Service Processor Damage; soft storage errors; MVS action after soft errors; hard storage errors; effect of hard storage errors; Channel Subsystem error reporting; Channel Path recovery; Terminal Error Condition; outstanding RESERVEs; Permanent Error Condition; Initialized Condition; I/O related errors; device/Control Unit errors (I/O errors); no path available; device status errors; Subchannel status errors; Hot I/O conditions; Hot I/O recovery; Hot I/O messages (non-DASD); Hot I/O messages (DASD); response to Hot I/O message; using IECIOSxx for Hot I/O processing; HIO options in IECIOSxx; example of IECIOSxx parameters; missing Interrupts; missing Interrupt intervals; special considerations for MIH intervals; Missing Interrupt messages; I/O Timing Facility; I/O Timing Messages; review questions.

z/OS MVS Software Environment

The z/OS environment - a lot of programs; software categories; the mission of an Operating System; workload in MVS; asking for MVS services; asynchronous MVS activities; asynchronous "unwelcome" MVS activities; summary; review questions.

Recovery Termination Manager (RTM)

Normal Program Termination; EXIT (SVC 3); abnormal program termination; Program Checks; system forced ABEND; program ABEND; why abnormal termination?; logical application error; program incomplete; application detected software error; system detected software error; hardware detected software error; PC FLIH and ABENDs; hardware detected software error example; Program Checks in the Supervisor; hardware problems; RTM actions; recovery; Functional Recovery Routines (FRRs); Extended Specify Task Abnormal Exit (ESTAE); system breakdown; software problem types; review questions.

MVS Error Reporting & Dumps

System error reporting; MVS dumps; Stand-Alone Dump (SADUMP); SVC dumps; user ABEND dumps; SYSUDUMP; SYSABEND; SYSMDUMP; CEEDUMP; generating a user ABEND dump; system generated ABEND dump; snap dumps; symptom dumps; review questions.

ABEND Analysis

What is ABEND?; the MVS ABEND service; why ABEND?; allows for recovery routines ; task termination; tasks in an Address Space; how RTM is invoked; program checks; ABEND; how to trigger an ABEND; ABEND macro and SVC 13; CALLRTM macro; why not normal end?; application detected software errors; system detected software errors; all the system ABEND codes; where do you see the ABEND codes?; the NOTIFY message; the SYSLOG; the job log; the symptom dump; ABEND dumps; SVC dumps; Stand-Alone dumps; the symptom dump in the SYSLOG; the symptom dump in the job log; explanations of ABEND and reason codes; IBM z/OS manuals on the web; Quickref and similar tools; analysis approach; examples of ABEND code explanation; system messages - a good information source; system message prefix; message level; standard message types; alternative message types; message identifier and MVS components; examples of system messages; explanation of system messages; common system ABEND codes; system ABEND code numbers; common SVCs and their macros; the x22 codes - caused by outside events; the x13 codes - OPEN problems; other x13 codes; example of S013-18; 806 - Program not found; sequence of events; example of S806-04; 804, 80A, 878, 878 and DC2 - virtual storage problems; the Virtual Address Space; "above the bar"; traditional address space areas; the need for managing virtual storage; storage for the program code; storage obtained outside the program; Virtual Storage requests; limitations on Virtual Storage; ABEND and reason codes; requests for storage below 2 GB (GETMAIN and STORAGE OBTAIN); requests for storage above 2 GB (IAR64 GETSTOR); the REGION limit; the effects of different REGION values; example of ABEND S822; the MEMLIMIT parameter; example of ABEND SDC2; the 0Cx codes; the Program Check Interrupt; running RTM1; PC FLIH and ABENDs; the meaning of Program Checks; common ABENDs from Program Checks; ABEND S0C4; Storage Protect Keys; virtual address protection; reasons for translation exceptions; address truly invalid; address valid - new area; address valid - old area; other S0Cx ABENDs; PIC 0001 Operation Exception (ABEND S0C1); PIC 0002 Privileged Operation Exception (ABEND S0C2); PIC 0007 Data Exception (ABEND S0C7); the S0E0 and 0Dx codes; miscellaneous problems; problems with translations; Linkage Stack problems; the Sx37 and SB14 codes; Sx37; EOV processing; how disk data sets are allocated; Physical Sequential (PS) data sets; problems when allocating a PS data set; initial allocation; primary allocation failure; data set full; no secondary allocation (SD37-04); secondary allocations (SB37-04); example of unavailable primary allocation; example of SD37-04; message IEC031I; example of ABEND SB37-04; message IEC030I; Partitioned Data Sets (PDS); problems when allocating a PDS; initial allocation; data set full; no secondary allocation (SD37-04); secondary allocations (SE37-04); directory full (SB14-0C); example of ABEND SE37-04; message IEC032I; example of ABEND SB14; message IEC217I; Partitioned Data Sets Extended (PDSE); problems when allocating a PDSE; summary of common system ABEND codes; other ABEND codes; MVS system codes (Sxxx); user ABEND codes (Uxxxx).


The Error Recording Data Set (ERDS) of MVS; LOGREC in MVS; LOGREC contents; LOGREC Event Record types; re-initializing LOGREC with IFCDIP00; re-allocating LOGREC with IFCDIP00; the EREP program; EREP reports; controlling EREP.

Generalized Trace Facility (GTF)

Traces in MVS; what is GTF?; how to obtain a GTF trace; the GTF JCL procedure; starting GTF; traceable events; GTF parameters - I/O events; examples of I/O parameters; CCW tracing example; CCW tracing output; dispatcher events; external interrupts; program interrupts; GTF-tracing of VTAM activity; SVC interrupts; recovery routines and SLIP events; parameter summary.

Read more

Why choose QA

Dates & Locations

Frequently asked questions

See all of our FAQs

How can I create an account on myQA.com?

There are a number of ways to create an account. If you are a self-funder, simply select the "Create account" option on the login page.

If you have been booked onto a course by your company, you will receive a confirmation email. From this email, select "Sign into myQA" and you will be taken to the "Create account" page. Complete all of the details and select "Create account".

If you have the booking number you can also go here and select the "I have a booking number" option. Enter the booking reference and your surname. If the details match, you will be taken to the "Create account" page from where you can enter your details and confirm your account.

Find more answers to frequently asked questions in our FAQs: Bookings & Cancellations page.

How do QA’s virtual classroom courses work?

Our virtual classroom courses allow you to access award-winning classroom training, without leaving your home or office. Our learning professionals are specially trained on how to interact with remote attendees and our remote labs ensure all participants can take part in hands-on exercises wherever they are.

We use the WebEx video conferencing platform by Cisco. Before you book, check that you meet the WebEx system requirements and run a test meeting (more details in the link below) to ensure the software is compatible with your firewall settings. If it doesn’t work, try adjusting your settings or contact your IT department about permitting the website.

Learn more about our Virtual Classrooms.

How do QA’s online courses work?

QA online courses, also commonly known as distance learning courses or elearning courses, take the form of interactive software designed for individual learning, but you will also have access to full support from our subject-matter experts for the duration of your course. When you book a QA online learning course you will receive immediate access to it through our e-learning platform and you can start to learn straight away, from any compatible device. Access to the online learning platform is valid for one year from the booking date.

All courses are built around case studies and presented in an engaging format, which includes storytelling elements, video, audio and humour. Every case study is supported by sample documents and a collection of Knowledge Nuggets that provide more in-depth detail on the wider processes.

Learn more about QA’s online courses.

When will I receive my joining instructions?

Joining instructions for QA courses are sent two weeks prior to the course start date, or immediately if the booking is confirmed within this timeframe. For course bookings made via QA but delivered by a third-party supplier, joining instructions are sent to attendees prior to the training course, but timescales vary depending on each supplier’s terms. Read more FAQs.

When will I receive my certificate?

Certificates of Achievement are issued at the end the course, either as a hard copy or via email. Read more here.

Contact Us

Please contact us for more information