This post will illustrate how to make a rough estimate of the dollar impact data entry errors have on common crude testing measurements. It is part of our series on estimating human errors in crude quality and measurement. It can be easy for an oil company to invest too little in reducing human errors, not because they are not important, but because they are hard to measure. The examples in the series outline rough calculations measurement supervisors can use to decide whether a specific source of human error is worth further investigation.
(10 minute read)
Read the previous article in the series here.
The impact of typos on your testing business is hard to measure. It would be nice if you could look up error estimates and impacts in a product brochure, or ASTM standard, but unfortunately that’s not the case. However, that doesn’t mean the effects of these errors are insignificant, or not worth measuring. In this post, we are going to outline rough estimates of economic impact, using conservative values for error rates for three types of data entry error. Each section includes estimates based on test type and facility configuration, as each business is unique. In all cases, we arrive at a large number, anywhere from 100,000$ per year to $2M per year for a 10,000bpd facility.
|Why did we write this post? The first thing we did when we set out on our mission to improve crude oil testing was to build our own lab, with all the equipment typically found in the field. We figured that walking a mile in the shoes of a field worker was the best way to gain insight into how to improve their lives and, ultimately, their business. We expected human error to be a significant part of testing discrepancies, but we didn’t expect it to be as significant as it was. The most common recurring error in our lab after a year of testing? Data entry.|
‘Data entry error’ is a fancy phrase for ‘typos’. They are easy to forget about, because they take place after testing procedures are finished (i.e. all the ‘hard stuff’ is done). However, data entry errors make a big difference to the bottom line, because they don’t follow the usual rules of error distributions - very large errors occur with a high probability of impact. Not only can this type of data entry errors cause large monetary losses, but they concentrate these losses into a handful of shipments. This concentrated effect of error makes custody disputes likely on these purchases. Since data entry is the last place people think to look for errors, these disputes can take lots of time and money to resolve.
Typos differ from more common error types in that they have a very unusual distribution: typically no errors in 95+% of records, but a high probability of very large errors in the few records that are wrong.
There is lots of academic literature on data entry errors, especially in the study of medical records. While different settings produce very different findings (including error rates as high as 25% of records), a survey of this literature suggests that finding data entry errors in at least 1-2% of records is reasonable for your average crude facility. In order to form a calculation, let’s imagine a typical oil testing operation where every measurement (e.g. of density, S&W, or sulphur content) requires two data entry steps:
Step 1: In the lab, a technician records the reading from the instrument on a piece of paper.
Step 2: Someone transcribes the paper testing records into a database for quality management or production accounting.
Based on this, we will assume errors are found in 2-4% of entries for our calculations (1-2% in each step). The most common types of data entry errors for these numerical measurements are:
- Single-digit typos,
- Incorrect decimal placement, and,
- Misaligned data transcription (i.e. mixing up results between two samples).
Each of these give different expected error distributions, but all allow for very large errors with high probability.
"Just because some testing errors are hard to measure doesn't mean they are not worth estimating."
- Single-Digit Typos
A single-digit typo is, as the name suggests, an incorrectly entered value in a single digit of a measurement result. The impact of these typos on the overall result depends on which digit the typo occurs. The chart below shows the distribution of single-digit typo by the error magnitude for density measurements assuming a random distribution of values, where the values are reported to a resolution of 0.1 kg/m3.
What does this image mean? 50% of the errors have a magnitude of greater than 10 kg/m3! This is much larger than most expected instrument errors. Using our error assumption of 2-4% of all density measurements, single-digit typos are expected to produce an expected aggregate error of 2.0-4.1 kg/m3 per measurement across all samples. Note that this error is much larger than the reported instrument error for any density meter.
What does this translate to? $580,000 - $1,160,000 per year (for a 10,000 bpd facility). Applying the same calculation to sulphur content and S&W (assuming less than 10% water), gives an expected aggregate error of 0.04-0.07% for each. This error range corresponds to a potential cost of single-digit typos of $90,000 - $180,000 per year for S&W and $280,000 - $560,000 per year for total sulphur in a 10,000 barrels-per-day facility.
- Incorrect Decimal Placement
Essentially, this type of data entry error is mistyping the decimal (i.e. ‘0.01%’ gets entered as ‘0.1%’). As a result, these errors are very large when they do happen, changing the measured value by at least 10x. Because these errors are so large, they can sometimes produce values that do not make physical sense, and therefore would (hopefully!) be easily detected before they impacted any key business decision.
For example, a density value in kg/m3 makes no physical sense for oil when multiplied or divided by 10 (e.g. 823.1 kg/m3 becomes 82.31 kg/m3 or 8231 kg/m3). However, S&W and sulphur content measurements both produce believable values when multiplied or divided by 10 (e.g. 0.05% becomes 0.5%). If we assume for simplicity that all decimal place misplacements affect the value by only one digit or a factor of 10, a 2-4% error rate translates into an aggregate error of 20-40% of the average value measured across a whole facility.
Taking the pipeline specification for sweet crude (0.5% for S&W and 0.5%wt for total sulphur) as a typical measurement value, the estimated effect of decimal place dislocations is 0.1-0.2%. The potential cost of decimal place errors translates to $240,000 - $480,000 per year for S&W and $800,000 - $1,600,000 per year for total sulphur in a 10,000 barrels-per-day facility.
- Misaligned Data Transcription
Remember when we talked about data entry involving two steps? This type of error occurs in the second step, when an operator copies paper data to an electronic version. Copying errors generally fall into two categories, differentiated by properties of the tested oil:
Category 1: Miscopied measurements of the same property
Suppose you are copying density measurements for 50 samples from your instrument user interface into your reporting database. At some point you skip an entry and then end up copying values into the wrong columns. For example: ‘sample 35’ measurement is entered as ‘sample 34’, ‘36’ is entered in the column for ‘35’ etc. All the mistyped density measurements were still density measurements, but for different samples.
Category 2: Miscopied measurements of different properties
Suppose you have S&W and sulphur content measurements done for a series of samples at a sour-crude processing facility. For one or more samples, total sulphur values get copied into the S&W column or vice versa.
The average error resulting from miscopied measurements of the same property (e.g. density) is equal to the standard deviation of the total distribution of measurements of that type at that facility. For example, a trucking terminal taking in a broad spectrum of light crudes, heavy crudes and condensates could have a standard deviation of 80 kg/m3. Miscopied density measurements would therefore be an average of 80 kg/m3 off from the correct value. Assume again an overall error rate of 2-4% and the aggregate error is 1.6-3.2 kg/m3 across the facility.
Translation? The potential cost of miscopied density measurements calculates to $455,000-$910,000 per year. This impact on error is about the same as we estimated for single-digit typos above.
|Even larger effects might be found with water cuts in a terminal or battery site accepting emulsion from multiple producers. In this scenario, a standard deviation of water cuts across all shipments could be as high as 20%. The same calculation as above predicts an aggregate error of 0.4-0.8% across the facility. The potential cost of miscopied water cuts translates to $956,000-$1,912,000 per year in this facility at 10,000 bpd.|
Miscopied measurements from different types can have large impact, especially when the values for the two different measurements lie in a different range, but produce believable sounding measurements for each other. For example, consider a facility that processes dry sour crude with an average water cut of 0.3% (standard deviation: 0.1%) and an average sulphur content of 2%wt (standard deviation: 0.5%). In this facility, water cuts mistyped as sulphur content values and vice versa would create seemingly believable results, but create large errors. Applying the same calculations as above, the expected aggregate error for both water and sulphur cuts as a result of miscopied measurements between the two columns is 0.03-0.07%. This corresponds to a potential cost of $280,000-$560,000 per year for sulphur and $84,000-$167,000 per year for S&W in this facility at 10,000 bpd.
Any one of these data entry errors could cost your business anywhere from $100,00 to $2,000,000 per annum (10,000 bpd facility). Read that again. You could be losing two million dollars per year, all because of typos! That’s not all the bad news: What makes data entry errors different from other types of error is that their financial impact is concentrated in a very small number of samples. This can make them more dangerous to operations for a couple of reasons:
- These errors could easily miss any quality control audits that are applied to a small ‘representative’ subset of samples
- Because they have huge impact on a single transaction, they are more likely to trigger disputes
- Since data entry is often the last place we think to look for errors, and often very hard to track down if there is little or no audit trail, then these disputes could also prove to be very costly and time-consuming to resolve
Human errors are very large, larger than instrument error in most cases. If you are looking to estimate the costs of typos to your business, we hope that this post will give you a place to start. Read more about the neglected problems that matter in oil testing in our previous article. The best investments in error reduction are those that target the largest errors, not necessarily those that are the easiest to measure.
Reducing data entry errors will have a higher ROI than instrument upgrades. Want to know how much these errors are affecting your business?