HIGH-FREQUENCY DATA
Studying the financial market using a large number of observations, often referred to as High Frequency Data (HFD), without doubt, improve our understanding of the market. Financial transactions data are now recorded at gradually increasing levels due to advances in computer technology and data storage, which have brought a dramatic fall in the cost of data recording.
High frequency data in finance refers to an extremely large amount of data, which is the full record of transactions and their associated characteristics at frequencies higher than on a daily basis (Engle, 2000). According to Dacarogna et al. (2001), “The number of observations in one single day of a liquid market is equivalent to the number of daily data within 30 years” (p. 6). The structure of HFD depends on the institution’s policy with regard to the production and gathering of data (Brownlees & Gallo, 2006).
HFD should be the primary source for those who are involved in analyzing and understanding financial markets seeing that it has power over unique features that are lacking in data recorded at lower frequencies, such as intra-day data. HFD have been widely used to study various market microstructures and related issues by both the academic community and industry. The analysis of HFD is valuable with regard to many issues in finance, including price movement detection, studying the market statistical properties, competition among related markets, the strategic behaviour of market participants, and the modelling of real time market dynamics (Yan & Zivot, 2003). Empirical studies on the use of HFD are offered by Andersen (2000), Engle (2000), Ghysels (2000), and Dacorogna et al. (2001).
With the introduction of HFD come new challenges associated with the processing and analysis of these HFD. On the processing challenge side, HFD may have a variety of erroneous or misleading observations and data gaps that are not reliable in terms of actual market trading activity. Some of these errors may result from computer system errors or internal system procedures using dummy ticks (Dacorogna, et al., 2001; Bingcheng & Zivot, 2003).
For instance, recording failures can lead to data gaps, and database viruses can lead to missing ordered time series observations (Yan & Zivot, 2003). Consequently, these data need to be filtered prior to a direct analysis in order to eliminate observations, which appear to be incompatible with existing market activity. A clean dataset is an essential pre-condition for moving onto the next step, which involves analysing these data. A failure to recognize these data errors in the data sets may cause ambiguous results in the statistical analysis. In the literature there are some contributions on HFD filtering issues (Blume & Goldstein, 1997; Dacorogna, et al., 2001; Falkenberry, 2002; Brownlees & Gallo, 2006; Oomen, 2006).In terms of the challenge posed by the analysis, with this HFD increase, many questions arise as to which analytical approach should be used in dealing with such vast amount of data that typically exhibit intense and enormous periodic patterns in market activity. One major issue when the data is of a higher frequency is the choice of time intervals as it is one of the central decisions regarding the analysis of the financial market. HFD allows us to study the financial market at varying time intervals, from microseconds to years. Some time intervals on a HFD may not contain any transactions. Moreover, the analyst dealing with HFD must aggregate the time series data to fixed time intervals, as the transaction data are inherently irregularly spaced in time (Russell, 1999).
5.