Case #533

Problems with "missing values" in the NordDRG system

Added by Ralph Dahlgren over 1 year ago. Updated 6 months ago.

Status:Further activeStart date:2017-02-16
Priority:MinorSpent time:-
Target version:-
Initiator:Sweden Target year:
Case type:Minor Owner / responsible:
MDC:GEN Old forum status:
Target Grouper:COMMON


There is a problem in NordDRG system and in the Grouper since many years.
This problem has in Sweden become particularly apparent in connection with the production of PAR 2015-year version(PAR= patient registry). The PAR problem has to do with "missing value" for the variable AGE, but can also apply to DURATION.
PAR as all other data sets may be "missing values" for these variables or indeed other variables, missing values in data sets are normal and no data set is perfect.
Given this the NordDRG system should have a handling of missing data that in these circumstances are adequate.

For Sweden, it is in PAR2015SV (SV=Incare) more than 30 thousand 'missing values' for the variable that concerns day of birth than there is normally. The Date of birth is used to calculate the age in days - which NorDRG requires – and will because of this become "missing values".

In the program that prepares the input file for grouping with Datawells grouper the variable for age are set according to demands from NordDRG (here and after named AGE) to -1 when it has the actual value "missing value".

This is done to prevent an error that otherwise may occur. The error consists in SAS (SAS = Statistical Analysis Software) print their screen representation of "missing value" – as a dot (.) – to the input file for the grouping, which makes Datawells groupers to end the grouping process.

Note that negative values of AGE may arise in calculating the age in days even though the "missing values" are not involved, the calculation based on dates of enrollment and birth can give a negative value if these are incorrect in relation to each other. But these values is then not "missing".

This negative value of AGE seems to be passed on (perceived) as a valid value as zero or less than 8 days in the Datawell grouper.
Now in Sweden this fact has had the consequence that the number of posts that has been grouped to DRG Q55N increased very significantly. DRG Q55N consists of several rules and they set the requirement AGELIM either AGE <8 or AGE <28th Records from PAR2015SV with AGE = -1 (actually "missing value") fulfill these conditions. And if the patient data met the rules conditions in other respects, they are grouped to this DRG. Many of those patient data set with missing values for AGE have ended here in Sweden.

This increase has been observed and the following attempt to remedy the problem has been made:
1) "1" has been replaced with "nothing" (empty string) in the input file
2) "-1" has been changed to "99999" (i.e. an unreasonably high value for AGE)

1) has not resulted in any change at all to the outcome. In all probability, we cannot know the facts behind when nothing is written down on this issue. So the "nothing" is transformed - the empty string for this element of AGE - to the value 0 (zero). Because 0 is equal to less than 8/28 as -1. This provides no change in the outcome.

2) have meant that the outcome changed, the number of entries that are grouped into Q55N has decreased by more than 10 thousand to a level significantly below what has been the case for previous years (2013-14). This is quite reasonable as we now move also the posts of the AGE with the missing value that before has before been grouped into DRG Q55N.

This has been presented as a solution to the problem but it causes other problems.

What is of general interest here, is about the fact that Datawells grouper seems to lack an adequate handling of "missing value" for numeric variables. There are two such in NordDRG: age in days and the length of stay in days. Grouping logic can make demands on them through the columns AGELIM respectively DUR. Characterized for these require-ments is to specify a comparison operator (less than respective greater than) and a value of AGE from the input file to be compared with in accordance with the operator.

For both age in days and length of stay in days could have perceived 0 as the lower boundary of valid values and values observed below this lower limit as a value which does not meet the requirement. Alternatively we could have a handling of "missing value", preferably the same for all (both) numeric variables in this context. Because the input file to the grouping of Datawells grouper (but probably for all groupers) is a “TEXTFILE” (format of data) the marker for missing value cannot be anything but an empty string. With an adequate handling of missing value - namely that such a value cannot meet the requirements – the grouping then would been able to get an adequate grouping results.

Unfortunately, there seems to be any of that, at least in Datawells group-er. To try to deal with this problem in the definitions data would require a large number of so-called validation rules in DRGlogic. If a lower limit of AGE could be given in AGELIM, ie you could specify ranges which could have been a solution to the problem.

Sweden therefore move AGE with missing values in the Swedish PAR from Q55N to other DRG's. The result of our changes confirms the hypothesis of the absence of handling “missing values” and that the Datawells grouper work this way.

There was a proposal to keep the original grouping result with a relative accumulation of AGEs with "missing values" to Q55N would apply for retaining functional similarity in behavior between years.
As for Sweden, the decision was to make these more than 30 000 entries spread to other DRGs. This means that Sweden will take this into account when using the PAR 2015 when working with NordDRG.

Therefore, this case for discussion because all countries must have the same problem so the issue has to be discussed and if not changed now there has to be a changed for the coming grouper.

Technical changes case #533.xlsx (15.2 KB) Martti Virtanen, 2017-04-18 14:03

TC _#533_C692_2018-03-28.xlsx (16.1 KB) Mats Fernström, 2018-03-28 16:07


#1 Updated by Kristiina Kahur over 1 year ago

Finnish National DRG-centre 2017-2-20

This is a necessary topic to discuss in the meeting. In Finland we haven't had problems with calculation the age (as far as we are aware of) but we have had some questions regarding the calculation the length of day, both in case of short therapy and inpatient cases.
Thanks for addressing the issue.

#2 Updated by Mats Fernström over 1 year ago

Comment by Mats Fernström 2017-02-22
It may seem that 30 000 are not so many in relation to a total of 1.5 million patients in the register, but when almost all are grouped incorrectly to a few DRGs in MDC 15, the DRG statistic becomes seriously wrong.
We have noticed a way to reduce the problem with missing date of birth in the Patient Register (PAR). There is another variable, year of birth, which can be used to calculate the age in days at admission or visit. If we set the date of birth to 2 July in the year of birth, the deviation from the correct date of birth will be maximum 182 days, which I think is good enough for patients that are two years or older. Unfortunately, the year of birth is also missing for some records in PAR so there were still somewhat more than 1 000 patients without known age.
Missing or negative duration is not a big problem in an outpatient dataset. Anything else than zero can be changed to zero before the batch grouping is done. Missing or negative duration in an inpatient dataset can be replaced by 1 (one day) before the grouping but it could be wise to flag these cases in any way so it is possible to exclude them when calculating average lengths of stay.
Regardless of how one prepares the datasets, it would be desirable that NordDRG marks the cases where age or duration is missing or negative. It makes us aware of the problem and gives a reminder if the dataset preparation has been forgotten. I think this is possible within the existing NordDRG logic by letting these cases go to Z-DRGs (former DRG 470). I suggest:
Z90 ‘Age is missing’
Z91 ‘Age is negative’
Z92 ‘Duration is missing’
Z93 ‘Duration is negative’
In the Common version these groups should correspond to DRG 470 with RTC codes with texts as above.
The rules for these DRGs must be placed in the beginning of Drglogic, in the Swedish version immediately after the rules for DRG Z70 (Principal diagnosis is missing).
In the rules for Z91 ‘Age is negative’ and Z93 ‘Duration is negative’ there must be “<0” in the fields for AGELIM and DUR, respectively.
In the rules for Z90 ‘Age is missing’ and Z92 ‘Duration is missing’ there must be just a minus sign in the fields for AGELIM and DUR, respectively. Then of course, the groupers (Datawell’s and others) must read this minus sign as that the data is missing. It shouldn’t be any problem, however, because we already have that principle; we have the minus sign in the field ICD in the rules for DRG Z70 (Principal diagnosis is missing) and we used to have the minus sign in the field SEX in the rules for DRG Z73 (Gender of patient is missing).

#3 Updated by Martti Virtanen over 1 year ago

2017-03-01 Martti Virtanen
NordDRG grouping logic is dependent on both age of the patient and length of stay. Therefore it cannot group cases with missing data on these items.
To avoid grouping the false DRG's it is reasonable (and easy) to block this by two simple rules.
Cases without age (age=' ') or negative age are grouped to DRG Z90 'Error in age'
Cases withot length of stay or negative length of stay are grouped to DRG Z92 'Error in length of stay'.
These rules are valid for all versions. The national code can be modified by the national organisations.

It is complicated (and propaply not necessary) to differentiate ' ' from numerical value <0 because the ASCII value of ' ' is less than '0'. The analysis of the cause of the error should be the responsibility of the local organisations.

#4 Updated by Martti Virtanen over 1 year ago

  • Status changed from Active to Rejected


#5 Updated by Martti Virtanen over 1 year ago

  • Status changed from Rejected to Active

#6 Updated by Martti Virtanen over 1 year ago

2017-03-13 Expert group
The case was not accepted. The problem remains. A missing time label can be substitued by any character with ASCII-value less than '0' using this system. Negative values behave similarly.
The technical changes illustrates the model, but it is not accepted.

#7 Updated by Martti Virtanen 7 months ago

2018-02-20 Martti Virtanen
To remind everybody that the problem is not solved.
Mats's point is also valid, so a solution would be needed.
i think that the proposed model is still valid.

#8 Updated by Mats Fernström 7 months ago

  • File Technical changes case _540-5 Comments SWE.xlsx added

Mats Fernström, NPK Sweden 2018-03-08 (NPK ID C692)
I didn’t know that <0 means both negative and missing values. With this information Martti’s simplified solution is appealing. But we cannot use it for duration, I’m afraid, because outpatients have no duration and then all outpatients will end up in DRG Z92/470 'Error in length of stay'. Missing or negative duration among inpatients in our Patient register is very rare and we have a method to handle these cases (see my comment 2017-02-22) so I think that we can leave the duration problem and let it be.
But we want Martti’s solution for negative or missing age. The only question is where to place the rules. Our first thinking was to place one rule as suggested by Martti in the beginning of each of the three areas for the different types of care leading to DRG Z90R 'Error in age, primary care', DRG Z90O 'Error in age, outpatients’ and DRG Z90N 'Error in age, inpatients' but on second thoughts we think that this is “overkill”. In approximately 90 % of the DRG groups, age is irrelevant and then a diagnosis or procedure based DRG is better than DRG Z90 'Error in age’, even though age is wrong.
An alternative is therefore to insert a rule for DRG Z90 'Error in age’ before each rule for an age depending DRG and these Z90 rules should then also have the same demands (MDC, Dgcat, Procpro etc.) as the following age depending DRG. That would result in at least 60 rules, which is “much ado about nothing”.
As mentioned before (2017-02-22), the most serious effect of wrong age is in the neonatal cases, so we think that it is good enough to have rules for those. See Technical changes case _533 Comment SWE.xlsx for details.

#9 Updated by Martti Virtanen 7 months ago

2018-03-09 Martti Virtanen
The duration of outpatients in Sweden and Norway should be 0 acording to previous agreements.
This demands a special rule for the transfer of data to the grouper.
If it is left empty the proposed rule will obviously not work. However, if so wanted, we can place the rule for duration at ord = 3999D999999 and it would only affect inpatient care.

In the combined definiton table set the drglogic-table has 690 rules that include age. They do not need each a separate rule but still it is a lot of new rules.
Missing age should not be possible in a modern information system and therefore the simple rule at the beginning is not necessary that bad.
(Mats please add the correct proposal for technical changes, the current is from another case)

#10 Updated by Mats Fernström 6 months ago

  • File deleted (Technical changes case _540-5 Comments SWE.xlsx)

#11 Updated by Mats Fernström 6 months ago

Mats Fernström, NPK Sweden 2018-03-28
Here is the correct file, TC _#533_C692_2018-03-28.xlsx.

Also available in: Atom PDF