Decoding Mortality Data Part 1 The Breakdown

working pattern internet abstract

When trying to decode NCHS mortality data downloaded from the CDC ftp site what you get is the string below. As discussed in this video

Decoding the NCHS Mortality Data Introduction

0 11519994835251999999924835200099917512 10111062381808 6 2450063 990999 99999 199956999569990000 002 390X93 435001281544100511S141 12S018 21S212 22X93 23S119 05 S018 S119 S141 S212 X93

The only way to decode this is with the Mort99doc.pdf here https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/mortality/Mort99doc.pdf.

As i was going through decoding this I started thinking, why even worry about what is at the very beginning. It will always be 0 and blank. So we will move the decoding to the first useful piece of data.

The raw part pf the string we are working on is 11519994835251999999924835200099917512 which below is broken down into the field sizes discussed below.

1 1 51 999 4 83 52 51 999 999 9 2 4 83 52 000 9 9 9 17 5 12

Everything before this is blank and We know this part starts at tape position 19 so we will count from there. Each Tape Location refers to the character positions in the string. Why it is still called Tape Location I have no idea. But here it is for the first part of decoding.

Tape Location 19 starts at the first character in the demo string above. One amusing note here, is that across 3 different locations the states can have two different numbers. In the case of this test line, Wyoming can be 51, or 52. Such standards….

–Start Decode–
Tape Location 19
Record Type
1
Residents

Tape Location 20
Resident Status
1
Residents

Tape Location 21-22
State of Occurrence
51
Wyoming

Tape Location 32 – 25
County of Occurrence
999

Tape Location 26
Region
4

Tape Location 27-28
Division and Stat Subcode
83
West, Mountain, Wyoming

Tape Location 29-30
Expanded State of Occurrence Code
52
Wyoming

Tape Location 31 – 32
State of Residence
51
Wyoming

Tape Location 33 – 35
County of Residence
999
balance of county

Tape Location 36-68
City of Residence
999
balance of county

Tape Location 39
Population Size of City of Residence
9
All Other Areas in the US

Tape Location 40
Metro – Non-metro County of Residence
2
Nonmetropolitan county

Tape Location 41
Region
4
West

Tape Location 42 – 43
Region Division
83
Mountain, Wyoming

Tape Location 44-45
Expanded State of Residence Code
52
Wyoming

Tape Location 46-48
NCHS PMSA/MSA of Residence
000
Nonmetropolitan Counties

Tape Location 49
Population Size of County of Occurrence
9
County of less than 100,000

Tape Location 50
Population Size of County of Residence
9
County of less than 100,000

Tape Location 51
PMSA/MSA Population size
9
Area of less than 100,000 or nonmetro area

Tape Location 52 – 53
Education
17
5 or more years of college

Tape Location 54
Education Recode
5
16 years or more

Tape Location 55-56
Month of Death
12
December

So we decoded the first part of the string then we are presented with a bunch more blanks before we get to the next usable data.
10111062381808

1 01 1 1 062 38 18 08

We can count over and see that this starts at Tape Location 59

Tape Location 59
Sex
1
Male

Tape Location 60-61
Race
01
White

Tape Location 62
Race Recode
1
White

Tape Location 63
Race Recode 2
1
White

Tape Location 64-66
Reported Age (detail Age)
062
0 01-99 … Years less than 100
6 01-59, 99 … Minutes
2 01-11,99 … Months

Tape Location 67-68
Age recode 52
38
38 … 60 – 64 years

Tape Location 69-70
Age Recode 27
18
18 … 60 – 64 years

Tape Location 71-72
08
08 … 55 – 64 years

Then we have some blanks which was for Infant Death decode. These are Locations 73-74

Bringing the full line here from the part we are starting at
6 2450063 990999 99999 199956999569990000 002 390X93 435001281544100511S141 12S018 21S212 22X93 23S119 05 S018 S119 S141 S212 X93

So this starts our decoding again at tape position 75

Tape Location 75
Place of Death and descendent Status
6
Residence

Tape Location 76 is reserved

Our next set of data to decode is 2450063 Startig at position 77

2 45 00 63

Tape Location 77 Marital Status
2
Married

Tape Location 78-79
State of Birth
45
Utah

Tape Location 80-81
Hispanic Origin
00
Non-Hispanic

Tape Location 82
Hispanic Origin / Race Recode
6
Non – Hispanic white

Tape Location 83
Day of Week of Death
3
Tuesday

Tape Location 84 is reserved so blank

We start again with this at postion 85
990 999

Tape Location 85-87
990
In this case Unknown, for other codes will
Have to cross reference the census occupation index to decode this number

Tape Location 88-90
Usual Occupation
999
Blank, Unknown

Everything from Tape Location 91 – 115 is either blank or a boilerplate value. We will start again at Tape Location 115 with the data below

199956999569990000

1999 56 999 56 999 0000

Tape Location 115-118
Current Data Year
1999

Tape Location 119-120
State of Occurrence
56
Wyoming

Tape Location 121-123
County of Occurrence
999

Tape Location 124-125
State of Residence
56
Wyoming

Tape Location 126-128
County of Residence
999

Tape Location 129-132
PMSA/MSA of Residence
000
0000 … Nonmetropolitan counties or foreign residents

Tape Location 133 is blank reserved

At location 134 we get this

00 2

Tape Location 134 – 135
CMSA of Residence
00
Not a CMSA

Tape Location 136
Injury at Work
2
No

Tape Location 137 and 138 are blank in this data set.

We will start agagin at 129 with the data below
3 9 0 X93

Tape Location 139
Manner of Death
3
Homicide

Tape Location 140
Activity Code
9
During unspecified Activity

Tape Location 141
Place of Injury for causes
Home

Tape Location 142-145
ICD Cause of death Code
X93 (extended coded will use all 4 characters)
Decoded from ICD10 Index
Assault (homicide) by handgun discharge

435 00 128 154 41 0 05 11S141
Tape Location 146 – 148
258 Cause Code
435
Cause codes were modified earlier, this is similar to the ICD10 and will have to be decoded with an external document

Tape Location 149 – 150
Reserved
00

Repeating complete string here for reference
0 11519994835251999999924835200099917512 10111062381808 6 2450063 990999 99999 199956999569990000 002 390X93 4350012815441005 11S141 12S018 21S212 22X93 23S119 05 S018 S119 S141 S212 X93

Tape Location 151-153
Cause Recode
128

Tape Location 154-156
Infant Cause Recode
154

Tape Locations 157-158
Cause Recode
41

Tape Locations 159
Reserved
0

Tape Locations 160 – 440
Multiple Conditions

Tape Locations 160-161
Nuymber of Entity-Axis conditions
5

Tape Locations 162 – End Conditons

11S141
Injury of nerves and spinal cord at neck level
12S018
Open wound of head
21S212
Open wound of thorax
22X93
Assault (homicide) by handgun discharge
23S119
Open wound of neck

End of Date Coder String


So at the beginning we started with this record from the Mort99us mortality file. Working through 30 pages of documentation we decoded this
0 11519994835251999999924835200099917512 10111062381808 6 2450063 990999 99999 199956999569990000 002 390X93 4350012815441005 11S141 12S018 21S212 22X93 23S119 05 S018 S119 S141 S212 X93

Into
Adult Male, 64 years old who was born in Utah and resided in Wyoming died on December 12th, 1999 by homicide. The conditions that lead up to the death were

1. Injury of nerves and spinal cord at neck level
2. Open wound of head
3. Open wound of thorax
4. Assault (homicide) by handgun discharge

Hang on for the next part building the perl script and dictionaries to process these automatically and ingest them into elasticsearch.

–Bryan

Leave a Reply

Powered by WordPress.com.

Up ↑

%d bloggers like this: