Commodity
Broker
Use of Sporadically Incorrect Data for
Historical Simulations
I'd like to share some thoughts about concern over data
vendors whose data contain errors. I do not know how bad some data is,
but in general, I believe the following principles apply.
For historical testing, although using perfectly correct
data might be ideal for perfect optimization, I do not believe historical
data needs to be this good to develop a dependable and profitable trading
system for real-world use. Assuming the errors in data are relatively
infrequent and not obviously absurd, there should be little concern
in using this kind of data for system development and testing.
Since markets tend to look "continuous" on charts
with occasional "common-sense" appearing discontinuities,
you can generally spot data that is grossly in error, and estimated
corrections to this kind of error can be made. Errors of a few ticks
on either highs or lows of a daily range (or other sampling period)
may be considered "noise." Errors in opens or closes within
high-low ranges may or may not be "common-sense" detectable,
so this data "noise" may be of most concern. However, although
market behaviors do tend to repeat, rarely, if ever, do patterns of
behavior duplicate themselves tick-for-tick.
Therefore, to assume that a trading-decision strategy
must be based on high precision tick-range patterns or indicators is
asking for trouble -- this would be indicative of over-optimization.
Shallow-sensitivity "robust" optimization, in my opinion,
is quite desirable, but steep-sensitivity optimization is likely to
be disastrous. (Here, "sensitivity" refers to the change in
simulation results as the characteristics of a market change over time,
and shallow/steep refer to abruptness of the change.)
My presumption also takes into account the total number
of trades that a strategy may generate over its useful lifetime. Fewer
trades imply longer trending durations and, therefore, the "noise"
relative to the magnitudes of the moves will be relatively small and
insignificant. As the number of trades increases for a given lifetime,
the trend durations shorten, moves generally become much less, and relative
"noise" becomes more significant.
However, assuming that a sufficient number of trades are
generated both in historical simulation and in real-trading so that,
statistically, no single trade dominates the overall results, a robust
strategy that produces consistent "small advantages" will
by design, be inherently "noise immune."
Bottom line: So what if the historical data is somewhat
in error -- the future is likely to produce data that differs somewhat
from the past anyway, so a profitable trading strategy for any given
market should be tolerant of some reasonable variation in data, whether
that data be historical or yet-to-occur as the future develops. A "good"
trading system should be reasonably "noise" immune, and data
that is somewhat "noisy" can be quite adequate for trading
strategy development purposes.
Having said all this, would I, or could I, trust using
potentially flaky recent data to create real-time trading orders, for
either day-trades or position trades? If I did not want to take time
to look over data for obvious gross errors before mechanically (blindly)
generating trading orders, using unreliable data for this purpose could
likely result in some very expensive losing trades. (There could also
be some serendipitous profitable trades, but I wouldn't hold my breath!)
So, in this context, having reliably accurate data is imperative and
I would definitely want to use a vendor whose data I could trust.
Understanding the strengths, weaknesses, and underlying
design of one's trading strategy coupled with the emotional considerations
of trust, confidence, and belief in that trading model would dictate
the comfort level of using data that could have sporadic errors in certain
ways. Even if I were willing to take time to carefully examine all data
for "common-sense" correctness, I might not be too comfortable
using data that would require my constant vigilance, even though my
"noise immune" trading strategy would probably produce reliably
profitable results over the longer term. Bottom line: In real-time trading,
for peace-of-mind, get the most reliable data available. |