How to accurately predict cryptocurrency prices? Learn about deep learning algorithms for trading, technical and social sentiment indicators

How to accurately predict cryptocurrency prices? Learn about deep learning algorithms for trading, technical and social sentiment indicators

Original title: Cryptocurrency Price Prediction: A Deep Learning Study of Trading, Technical and Social Sentiment Indicators

It may be our illusion that deep participants and holders in the cryptocurrency circle get higher returns rather than traders, because cryptocurrency assets are naturally suitable for arbitrage high-frequency robot trading.

Although we, retail investors who have some faith but are not veterans, are somewhat disdainful of trading technical indicators, we are not so indifferent to the fluctuation of currency prices. The rise and fall of prices more or less disturb our emotions.

As for the prediction or feeling of the currency price, it is only based on social sentiment. This article draws the conclusion from a large number of technical, transaction, and social sentiment indicators through various deep learning algorithms that the deep learning results of comprehensive technical, transaction, and social sentiment indicators are better for predicting currency prices than a single indicator. The sentiment indicators based on technical developers of Github and Reddit are more valuable for reference.

Although this is not necessarily correct or true, after all, deep learning algorithms and data may have problems. However, this 30-page paper is enough to scare us. Now that we can use such advanced algorithms and have such open and rich data to predict crypto asset transactions, what should we retail investors do?

Maybe just do your homework. Why should we invest in this project? How can we contribute to the project? How can I not care about the price of the coin?

Title: On Technical Trading and Social Media Indicators in Cryptocurrencies' Price Classification Through Deep Learning

Author(s):Marco Ortu, Nicola Uras, Claudio Conversano, Giuseppe Destefanis, Silvia Bartolucci

URL: http://arxiv.org/abs/2102.08189

summary

Predicting the price of cryptocurrencies is a notoriously difficult task due to the high volatility of the cryptocurrency market and the presence of new mechanisms. In this work, we focus on two major cryptocurrencies, Ethereum and Bitcoin, over the period 2017-2020. A comprehensive analysis of the predictability of price fluctuations is performed by comparing four different deep learning algorithms (Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) Neural Network, and Attention Long Short-Term Memory (ALSTM)) and three categories of features. In particular, we consider technical indicators (such as opening and closing prices), trading indicators (such as moving averages), and social indicators (such as user sentiment) as inputs to the classification algorithms. We compare a restricted model consisting of only technical indicators and an unrestricted model that includes technical, trading, and social media indicators. The results show that the unrestricted model outperforms the restricted model, i.e., including trading and social media indicators, as well as classic technical variables, resulting in a significant improvement in the prediction accuracy of all algorithms.

1 Introduction

Over the past decade, global markets have witnessed the rise and exponential growth of cryptocurrency trading, with a global daily market capitalization of hundreds of billions of dollars (reaching approximately $1 trillion as of January 2021).

Recent surveys show that despite the risks associated with price volatility and market manipulation, institutional investors’ demand and interest in new crypto assets are surging due to their novel features and potential rise in value amid the current financial storm.

Boom and bust cycles are often caused by network effects and wider market adoption, making prices difficult to predict with high accuracy. There is a large literature on this issue, and many quantitative methods for cryptocurrency price prediction have been proposed [13, 15–18]. The rapid fluctuations in volatility, autocorrelation, and multiscaling effects of cryptocurrencies have also been widely studied [22], as have their effects on initial coin offerings (ICOs) [10, 11].

An important consideration that has gradually emerged in the literature is the “social” relevance of cryptocurrency trading. The underlying code of blockchain platforms is developed in an open-source manner on Github, the latest additions to the crypto ecosystem are discussed on dedicated channels on Reddit or Telegram, and Twitter provides a platform for often heated debates on the latest developments. More precisely, it has been shown that sentiment indices can be used to predict price bubbles [5] and that sentiment extracted from Reddit discussions is correlated with prices [28].

Open source development also plays an important role in shaping the success and value of cryptocurrencies [21, 25, 27]. In particular, a previous work by Bartolucci et al. [2] (which this work extends) showed that there is a Granger causal relationship between sentiment time series extracted from developer comments on Github and the returns of cryptocurrencies. For two major cryptocurrencies, Bitcoin and Ethereum, it was also shown how incorporating developer sentiment time series into a forecasting algorithm can significantly improve the accuracy of the forecast.

In this paper, we further extend previous research on price predictability using deep learning methods and focus on the two largest cryptocurrencies by market capitalization, Bitcoin and Ethereum.

We predict price changes by mapping the on-time price prediction to a classification problem: our target is a binary variable with two unique categories, up and down changes, indicating an increase or decrease in price. Below we compare the performance and results of four deep learning algorithms: Multilayer Perceptron (MLP), Multivariate Attention Long Short-Term Memory Fully Convolutional Network (MALSTMFCN), Convolutional Neural Network (CNN), and Long Short-Term Memory Neural Network (LSTM).

We will use as input the following categories of (financial and social) indicators: (i) technical indicators, such as opening and closing prices or volumes, (ii) trading indicators, such as momentum and moving averages calculated from prices, and (iii) social media indicators, i.e. sentiment elements extracted from Github and Reddit comments.

For each deep learning algorithm, we consider a restricted and unrestricted data model with hourly and daily frequencies. The restricted model consists of technical variables data for Bitcoin and Ethereum. In the unrestricted model, we include social media indicators and technical and transactional variables from Github and Reddit.

In all four deep learning algorithms, we were able to demonstrate that the unrestricted model outperforms the restricted model. At hourly data frequency, combining trading and social media indicators with classical technical indicators improves the accuracy of Bitcoin and Ethereum price predictions from 51-55% for the restricted model to 67-84% for the unrestricted model. For daily frequency resolution, in the case of Ethereum, the most accurate classification is achieved using the restricted model. In contrast, for Bitcoin, the unrestricted model including only social media indicators achieves the highest performance.

In the following sections, we discuss in detail the implemented algorithms and the bootstrapped validation techniques used to evaluate the model performance.

This paper is organized as follows. In Section 2, we describe the data and metrics used in detail. In Section 3, we discuss the methodology of the experiments. In Section 4, we present the results and their significance, and in Section 5, we discuss the limitations of this study. Finally, in Section 6, we summarize our findings and outline future directions.

2 Datasets: Technology and Social Media Indicators

This section discusses the datasets and the three categories of metrics used for the experiments.

2.1 Technical indicators

We analyzed Bitcoin and Ethereum price time series at hourly and daily frequencies. We extracted all available technical variables from crypto data download web services, in particular from the Bitfinex.com website transaction data service. We considered the last 4 years, from 2017/01/01 to 2021/01/01, for a total of 35,638 hours of observations.

In our analysis, we divide technical indicators into two categories: pure technical indicators and trading indicators. Technical indicators refer to "direct" market data such as opening and closing prices. Trading indicators refer to derived indicators such as moving averages.

The technical indicators are as follows:

    • Closing Price: The last trading price of a cryptocurrency during a trading period.

    • Opening Price: The price at which a cryptocurrency first trades at the start of a trading period.

    • Low: The lowest price at which a cryptocurrency is traded during a trading cycle.

    • High: The highest price at which a cryptocurrency has traded during the trading period.

    • Volume: The number of completed cryptocurrency transactions.

Tables 1 and 2 show the summary statistics of the technical indicators. In Figures 1 and 2, we also show the historical time series graphs of the technical indicators.

Based on these technical indicators, trading indicators such as moving averages can be calculated. More precisely, we use the StockStats Python library to generate them.

We used 36 different trading indicators, as shown in Table 4. The lag value means the previous value (t−1, t−n) used as input. The window size indicates the number of previous values ​​used to evaluate the indicator at time t, for example, to calculate ADXRt at time t, we use ADXt−1, …, ADXRt−10, ten previous values.

We provide here definitions of five major trading indicators.

    • Simple Moving Average (SMA): The arithmetic mean of a cryptocurrency’s closing prices over a certain period of time (called a time period).

    • Weighted Moving Average (WMA): A moving average calculation that gives higher weight to the most recent price data.

    • Relative Strength Index (RSI): is a momentum indicator that measures the magnitude of recent price changes. It is often used to assess whether a stock or other asset is overbought or oversold.

    • Rate of Change (ROC): Measures the percentage change between the current price and the price a certain period ago.

    • Momentum: is the rate of price acceleration, i.e. the speed at which prices are changing. This measure is particularly useful for identifying trends.

    • On Balance Volume (OBV): is a technical momentum indicator based on an asset's trading volume and is used to predict stock price changes.

Tables 3 and 5 show the trading indicator statistics for the considered analysis period. In Figures 3 and 4, we can see the same trading indicators in historical time series plots. The next section will use technical and trading indicators to create a price classification model.

2.2 Social Media Metrics

This section describes how time series of social media metrics are constructed from Ethereum and Bitcoin developer comments on Github and user comments on Reddit, respectively. In particular, for Reddit, we consider the four sub-Reddit channels listed in Table 6. The time period considered is from January 2017 to January 2021.

Examples of developer comments extracted from Github for Ethereum and user comments extracted from Reddit r/Ethereum can be seen in Tables 7 and 8. As described in this example, quantitative measures of sentiment associated with comments are calculated using state-of-the-art text analysis tools (further detailed below). These social media metrics calculated for each comment are sentiments such as Love (L), Happiness (J), Angry (A), Sadness (S), VAD (Valence (Val), Dominance (Dom), Arousal (Ar)), and Sentiment (Sent).

2.3 Evaluating Social Media Metrics via Deep Learning

We extract social media metrics from the BERT model [8] using a deep, pre-trained neural network, called Bidirectional Encoder Representations. BERT and other Transformer Encoder architectures have been successfully used on a variety of tasks in Natural Language Processing (NLP) and represent an evolution of the recurrent neural networks (RNNs) commonly used in NLP. They compute vector space representations of natural language suitable for use in deep learning models. The BERT family of models uses a Transformer encoder architecture to process each token of the input text in the full context of all the tokens before and after, hence the name: Bidirectional Encoder Representations from Transformers. BERT models are typically pre-trained on a large text corpus and then fine-tuned for a specific task. These models provide dense vector representations for natural language by using a deep, pre-trained neural network, the Transformer architecture is shown in Figure 5.

The transformer is based on an attention mechanism, where the RNN cell encodes the input into a hidden vector ht until timestamp t. The latter will subsequently be passed to the next timestamp (or to the transformer in the case of sequence-to-sequence models). By using an attention mechanism, one no longer tries to encode the complete source sentence into a fixed-length vector. Instead, at each step of output generation, the decoder is allowed to process different parts of the source sentence. Importantly, we let the model learn what to pay attention to based on the input sentence and what it has produced so far.

The Transformer architecture allows the creation of NLP models that are trained on very large datasets, as we do in this work. Training such models on large datasets is feasible since pre-trained language models can be fine-tuned on a specific dataset without retraining the entire network.

The weights learned by a broad pre-trained model can be reused later for a specific task by simply adjusting the weights to the specific dataset. This will allow us to leverage the knowledge learned by a pre-trained language model through more fine-grained weight adjustments by capturing the lower-level complexity of a specific dataset.

We used Tensorflow and Keras Python libraries in the Transformer package to leverage the power of these pre-trained neural networks. In particular, we used the BERT base case pre-trained model. Figure 6 shows the architecture design used to train three NN classifiers for extracting social media metrics. This figure shows the three gold datasets used to train the final model, namely Github, Stack Overflow, and Reddit.

In particular, we used a sentiment labeling dataset consisting of 4,423 posts mined from Stackoverflow user comments to train a sentiment model for Github: comments on both platforms are written using technical terms from software developers and engineers. We also used a sentiment labeling dataset of 4,200 sentences from Github [23]. Finally, we used a sentiment labeling dataset containing over 33K labeled Reddit users’ comments.

Tables 9, 10, and 11 show the performance of sentiment and emotion classification on two different datasets, Github and Reddit.

2.3.1 Social Media Metrics on Github

Both the Bitcoin and Ethereum projects are open source, so the code and all interactions between contributors are publicly available on GitHub [26]. Active contributors continuously open, comment on, and close so-called “issues”. An issue is an element of the development process that contains information about bugs found, suggestions for new features to be implemented in the code, new features, or new functionality that is being developed. It is an elegant and effective way to track all stages of the development process, even in complex and large projects involving a large number of remote developers. An issue can be “commented on”, which means that developers can start a sub-discussion around it. They usually add a comment to a specific issue to highlight the action being taken or to make suggestions on possible solutions. Every comment posted on GitHub is timestamped; therefore, the exact time and date can be obtained and a time series can be generated for each impact metric considered in this study.

For sentiment analysis, we use the BERT classifier explained in 2.3, which is trained using the public Github sentiment dataset developed by Ortu et al. [24] and extended by Murgia et al. [23]. This dataset is particularly suitable for our analysis because the sentiment analysis algorithm is trained on developer comments extracted from the Apache Software Foundation's Jira issue tracking system and is therefore in the software engineering domain and context of Github and Reddit (considering selected subreddits). The classifier can analyze love, anger, joy, and sadness with an F1 score close to 0.89.

Valence, Arousal, and Dominance, also known as VAD, represent conceptualized affective dimensions that describe a subject's interest, alertness, and sense of control over a particular stimulus, respectively. In the context of software development, the VAD metric can represent a developer's level of engagement with a project, as well as their confidence and responsiveness in completing tasks. Warriner et al. [30] created a reference dictionary of 14,000 English words whose VAD scores can be used to train a classifier, similar to the approach of Mantyla et al. [20]. In [20], they extracted VAD metrics from 700,000 Jira issue reports containing more than 2 million comments and showed that different types of issue reports (e.g., feature requests vs bugs) have varying sentiments. In contrast, increasing issue priority generally increases Arousal.

Finally, sentiment is measured using the BERT classifier explained in 2.3 and a public dataset used in similar studies [3,4]. The algorithm extracts the sentiment polarity expressed in the short text from three levels: positive (1), neutral (0), and negative (-1).

Our analysis focuses on three categories of sentiment metrics: emotion (love, joy, anger, sadness), VAD (valence, positivity, dominance), and sentiment. As we specified in Section 2.3, we extract it from the annotated text for each impact metric class using a customized tool.

Once the numerical values ​​of the influence measures are calculated for all comments (as shown in the examples in Tables 7 and 8), we construct the corresponding social media time series by considering the comment timestamp (i.e., the date when the comment was posted). The sentiment time series is the sentiment and emotion aggregated over multiple comments at hourly and daily intervals according to the considered temporal frequency (hourly and daily).

For a given social media metric (e.g., outrage) and a specific temporal frequency, we construct a time series by averaging the impact measure of all comments posted that day.

In Tables 12 and 13, we report in detail the summary statistics of the social indicator time series for both cryptocurrencies, respectively. We also report the time series of all social media indicators for Bitcoin and Ethereum in Figures 7 and 8, respectively.

2.3.2 Measuring Reddit’s influence metrics

The social media platform Reddit is an American social news aggregation, web content rating and discussion website with approximately 8 billion visits per month. It is the most popular social network in English-speaking countries, especially Canada and the United States. Almost all messages are written in English, with a few in Spanish, Italian, French and German.

Reddit is built on multiple subreddits, each dedicated to discussing a specific topic. Therefore, there are specific subreddits for major cryptocurrency projects. For each cryptocurrency in this work, two subreddits were analyzed, one technical and one trading-related. In the tab, the subreddits considered. As shown in the figure, for each subreddit, we collected all the comments from January 2017 to January 2021.

For sentiment detection, we use the BERT classifier explained in 2.3, which is trained using the public Github sentiment dataset developed by Ortu et al. [24] and extended by Murgia et al. [23]. This dataset is particularly suitable for our analysis, as described in the previous section.

The classifier can detect love, anger, joy and sadness with an F1 score close to 0.89. For the VAD metric, we used the same approach as in 2.3.1, while for sentiment we used the previous approach, the BERT deep learning algorithm, which was trained using a public golden dataset for Reddit comments available on the largest and well-known web platform for shared datasets, Kaggle.com.

Tables 14 and 16 and Figures 9 and 11 show the statistics and time series of these two Bitcoin sub-Reddits.

Tables 17 and 15 and Figures 10 and 12 show the statistics and time series of these two Ethereum sub-Reddits.

2.4 Price Change Classification

The target variable is a binary variable with two unique classes listed below.

    • Rising: This class, labeled Up and coded as 1, indicates a situation where prices are rising.

    • Down: This is marked as down and coded with 0, indicating a situation where the price is falling.

Figure 13 shows the class distribution and dataset for hourly and daily frequencies, highlighting that we are dealing with a fairly balanced classification problem in the hourly frequency case and a slightly imbalanced one in the daily frequency case.

Table 18 shows the details of the up and down instances, which are 48%, 5%, and 51.5% for Bitcoin and 49%, 8%, and 50%, 2% for Ethereum. For daily frequency, they are 44%, 8%, and 55.2% for Bitcoin and 48%, 5%, and 51%, 5% for Ethereum. For daily frequency of Bitcoin, we have a slightly unbalanced distribution of the up class, in which case we will consider the f1 score along with the accuracy to evaluate the performance of the model.

2.5 Time Series Processing

Since we are working with a supervised learning problem, we prepare our data to have a vector of x inputs and y outputs that are time dependent. In this case, the input vector x is called a regressor. The x input consists of the predicted values ​​of the model, i.e. one or more values ​​in the past, the so-called lagged values. The inputs correspond to the values ​​of the selected features discussed in the previous chapters. The target variable y is a binary variable that can be either 0 or 1. The 0 (down) instance indicates that the price is going down. The 0 instance at time t is obtained when the difference between the closing price at time t and the opening price at time t+1 is less than or equal to 0. The 1 (up) instance indicates that the price is going up, i.e. a price increase. The 1 instance is obtained when the difference between the closing price at time t and the opening price at the next time step t+1 is greater than 0. We considered two time series models:

    • Restricted: The input vector x contains only technical indicators (open, close, high, low, volume).

    • Unrestricted: The input vector x consists of technical, trading, and social media indicators.

For both restricted and unrestricted models, we use 1 lag value for each indicator. The purpose of this distinction is to determine and quantify whether the addition of trading and social media sentiment indicators to the regression vector will effectively improve the classification of price changes in Bitcoin and Ethereum.

3 Methodology

This section describes the deep learning algorithms considered in our analysis and then discusses the fine-tuning of hyperparameters.

3.1 Multilayer Perceptron

Multilayer Perceptron (MLP) is a type of feedforward artificial neural networks (ANNs) that has the characteristics of multilayer perceptrons and typical activation functions.

The most common activation functions are:

Where Vi is the input weight vector.

MLP contains three main categories of nodes: input layer nodes, hidden layer nodes, and output layer nodes. Except for the input nodes, all nodes of the neural network are perceptrons that use nonlinear activation functions. MLP is different from linear perceptrons because it has a multi-layer structure and nonlinear activation functions.

In general, MLP neural networks are very resistant to noise and can support learning and reasoning in the presence of missing values. Neural networks do not make strong assumptions about the mapping function and can easily learn linear and nonlinear relationships. Any number of input features can be specified, providing direct support for multi-dimensional forecasting. Any number of output values ​​can be specified, providing direct support for multi-step and even multivariate forecasting. For these reasons, MLP neural networks may be particularly useful for time series forecasting.

In the recent development of deep learning technology, the rectified linear unit (ReLU) is a piecewise linear function that is often used to solve numerical problems related to the sigmoid function. Examples of ReLU are the hyperbolic tangent function that varies between -1 and 1, or the logistic function that varies between 0 and 1. Here the output of the i-th node (neuron) is yi, and the weighted sum of the input connections is vi.

Alternative activation functions have been developed by including the rectifier and softmax functions. The radial basis function includes more advanced activation functions (used in radial basis networks, another class of supervised neural network models).

Since MLPs are fully connected architectures, each node in one layer is connected to each node in the next layer with specific weights wi,j. Neural networks are trained using supervised back-propagation and optimization methods (stochastic gradient descent is a widely used method). After data processing, the perceptron learns by adjusting the connection weights, depending on the amount of error in the output relative to the expected result. Back-propagation in the perceptron is a generalization of the least mean square (LMS) algorithm.

When the nth training sample is presented to the input layer, the amount of error in the output node j is ej(n) = dj(n) − yj(n), where d is the predicted value and y is the actual value that the perceptron should have generated. The backpropagation method then adjusts the node weights to minimize the overall output error provided by equation (2):

The adjustment of each node weight is further calculated using the gradient descent method in formula (3), where yi is the output of the previous neuron and η is the learning rate:

The parameter η is usually set to a trade-off between the weights converging to the response and oscillating around the response.

The induction local field vj changes, and its derivative can be calculated:

where φ′ is the derivative of the activation function above, and the activation function itself remains unchanged. When the weights of the hidden nodes are modified, the analysis is more difficult, but it can be shown that the relevant quantity is the quantity shown in equation (4). The algorithm represents the back propagation of the activation function, as shown in equation (4), which depends on the adjustment of the weights of the kth layer representing the output layer, which in turn depends on the derivative of the activation function of the hidden layer weights.

3.2 Long Short-Term Memory Network

Long short-term memory networks are a special form of recurrent neural networks (RNNs) that are able to capture long-term dependencies in data sequences. RNNs are artificial neural networks with a specific topology that are specialized for recognizing patterns in different types of data sequences: for example, natural language, DNA sequences, handwriting, word sequences, or digital time series data streams from sensors and financial markets [12]. Classic recurrent neural networks have a significant disadvantage in that they cannot process long sequences and capture long-term dependencies. RNNs can only be used for short sequences with short-term memory dependencies. LSTMs were developed to address the long-term memory problem and are directly derived from RNNs to capture long-term dependencies. LSTM neural networks are organized in units and perform transformations on the input sequence by applying a sequence of operations. The internal state variables are retained by the LSTM cells as they are forwarded from one unit to the next and are updated by so-called operation gates (forget gate, input gate, output gate), as shown in Figure 16. All three gates have different and independent weights and biases, so that the network can learn how much of the previous output and current input to maintain, and how much of the internal state to pass to the output. Such gates control how much of the internal state is transferred to the output and operate similarly to the other gates. The LSTM unit consists of:

  • 1 Cell state: This state carries the information of the entire sequence and represents the memory of the network.

  • 2 Forget Gate: It filters out relevant information retained from previous time steps.

  • 3 Input gate: It decides what relevant information to add from the current time step.

  • 4 Output gate: It controls the output amount of the current time step.

The first step is the forget gate. This gate takes the past or lagged values ​​as input and decides how much of the past information should be forgotten and how much should be kept. The input of the previous hidden state and the current input are transferred to the output gate through the sigmoid function. When the information can be forgotten, the output is close to 0, and when the information is to be kept, the output is close to 1, as shown below:

The matrices Wf and Uf contain the weights of the input connection and the recurrent connection, respectively. The subscript f can represent a forget gate. xt represents the input vector of the LSTM, and ht+1 represents the hidden state vector or output vector of the LSTM unit.

The second gate is the input gate. In this stage, the cell state is updated. The previous hidden state and the current input are initially represented as inputs to the sigmoid activation function (the closer the value is to 1, the more relevant the input is). To improve the tuning of the network, it also passes the hidden state and the current input to the tanh function to compress the values ​​between −1 and 1. The outputs of the tanh and sigmoid are then multiplied element-wise (in the formula below, the symbol * represents the element-wise multiplication of two matrices). The sigmoid output in Equation 6 determines the important information to be retained from the tanh output:

The cell state can be determined after the input gate is activated. Next, the cell state at the previous time step is element-wise multiplied by the forget gate output. This causes values ​​in the cell state to be ignored when they are multiplied by values ​​close to 0. The input gate output is element-wise added to the cell state. The new cell state in Equation 7 is the output:

The last gate is the output gate, which specifies the value of the next hidden state, which contains a certain amount of previous input information. Here, the current input and the previous hidden state are added and forwarded to the sigmoid function. The new cell state is then transferred to the tanh function. Finally, the tanh output is multiplied with the sigmoid output to determine what information the hidden state can carry. The output is a new hidden state. The new cell state and the new hidden state are then moved to the next stage via Equation 8:

To perform this analysis, we used the Keras framework [7] for deep learning. Our model consists of a stacked LSTM layer and a densely connected output layer with one neuron.

3.3 Attention Mechanism Neural Network

Attention functions are an important aspect of deep learning algorithms, which are an extension of the encoder-decoder paradigm and aim to improve the output of long input sequences. Figure 16 shows the key idea behind AMNN, which is to allow the decoder to selectively access encoder information during decoding. This is achieved by creating a new context vector for each decoder step, calculating it based on the previous hidden state as well as the hidden states of all encoders, and assigning trainable weights to them. In this way, the attention technique gives different priorities to the input sequence and focuses more on the most important inputs.

The encoder operation is very similar to the encoder-decoder hybrid operation itself. The representation of each input sequence is determined at each time step as a function of the hidden state at the previous time step and the current input.

The final hidden state includes all the encoded information from the previous hidden representations and the previous input.

The key difference between the attention mechanism and the encoder-decoder model is that for each decoder step t, a new context vector c(t) is computed. We proceed as follows to measure the context vector c(t) at time step t. First, for each combination of encoder time step j and decoder time step t, we compute the so-called alignment score e(j, t) using the weighted sum in Equation (9):

Wa, Ua, and Va are the learned weights in this formula, which are called attention weights. The Wa weights are linked to the hidden state of the encoder, the Ua weights are linked to the hidden state of the decoder, and the Va weights determine the function for calculating the alignment score. The score e(j, t) is normalized at each time step t using the softmax function over the time segment of encoder j to obtain the following attention weight α(j, t):

The importance of the input at time j is represented by the attention weight α(j, t) used to decode the output at time t. The context vector c(t) is estimated based on the attention weights as the weighted sum of all hidden values ​​of the encoder as follows:

According to this approach, a so-called attention function is triggered by a vector of contextual data, weighting the most important inputs.

The context vector c(t) is now forwarded to the decoder to compute the probability distribution of the next possible output. This decoding operation involves all time steps present in the input. The current hidden state s(t) is then calculated according to the recurrent unit function, taking as input the context vector c(t), the hidden state s(t−1), and the output yˆ(t−1), according to the following equation:

Using this function, the model can identify the relationship between different parts of the input sequence and the corresponding parts of the output sequence. The softmax function is used to calculate the output of the decoder in the weighted hidden state at each time t:

For LSTM, the attention mechanism provides better results with long input sequences due to the presence of attention weights.

In this study, we specifically used the Multivariate Attention LSTM with Fully Convolutional Network (MALSTM-FCN) proposed by Fazle et al. Figure 17 shows the architecture of MALSTM-FCN, including the number of neurons in each layer. The input sequence is parallelized with the fully convolutional layers and the attention LSTM layers, and is concatenated and passed to the output layer via a softmax activation function for binary classification. The fully convolutional block contains three temporal convolutional blocks consisting of 128, 256, and 256 neurons, respectively, which serve as feature extractors. Before cascading, each convolutional layer is completed with batch normalization. Dimensionality shuffling transforms the temporal dimension of the input data so that the LSTM obtains global temporal information for each variable at a time. Therefore, for time series classification problems, the dimension shuffling operation reduces the computational time of training and inference without losing accuracy [15].

3.4 Convolutional Neural Networks

Convolutional neural networks (CNNs) are a special type of neural network that are most commonly used in deep learning applications such as image processing, image classification, natural language processing, and financial time series analysis [6].

The most critical part of the CNN architecture is the convolutional layer. This layer performs a mathematical operation called a convolution. In this case, a convolution is a linear operation that involves multiplication between the input data matrix and a two-dimensional array of weights, called filters. These networks use convolution operations in at least one layer.

Convolutional neural networks have a similar structure to traditional neural networks, including input and output layers and multiple hidden layers. The main feature of CNN is that its hidden layers are usually composed of convolutional layers that perform the above operations. Figure 18 describes the general architecture of CNNs for time series analysis. We use a one-dimensional convolutional layer instead of the usual two-dimensional convolutional layer typical of image processing tasks. The first layer is then normalized with a polling layer and then flattened so that the output layer can process the entire time series at each step t. In this case, many one-dimensional convolutional layers can be combined in a deep learning network.

For the implementation of CNN, we use the Keras framework [7] for in-depth learning. Our model consists of two or more stacked one-dimensional CNN layers, one close-connected layer with N neurons for polling, one close-connected layer with N neurons for flattening, and the last close-connected output layer with one neuron.

3.5 Hyperparameter adjustment

Hyperparameter adjustment is a method of optimizing the hyperparameters of a given algorithm. It is used to determine the optimal configuration of the hyperparameters to enable the algorithm to achieve the best performance and evaluate based on specific prediction errors. For each algorithm, select the hyperparameters to be optimized and define an appropriate search interval for each hyperparameter, including all values ​​to be tested. The algorithm is then matched to the first selected hyperparameter configuration to a specific part of the data set. The fitting model is tested on a portion of data that has not been used before the training phase. This test program returns a specific value of the selected prediction error.

The optimization program through the grid search program [19] ends after testing all possible combinations of hyperparameter values. Therefore, the hyperparameter configuration that produces the best performance in the selected prediction error is selected as the optimization configuration. Table 19 shows the hyperparameter search interval for each implementation algorithm. Since MALSTM-FCN is an architecture specific to deep neural networks, the number of layers, the number of neurons per layer, and the activation function for each layer have been pre-specified (as described in Section 3.3).

To ensure the robustness of the hyperparameter optimization process, we used model validation techniques to evaluate how the performance obtained by a given model generalizes to a separate data set. This validation technique involves dividing the data samples into a training set (for fitting the model), a validation set (for validating the fitting model), and a test set (for evaluating the final optimization generalization capability of the model). In our analysis, we implemented the Boostrap method using 37.8% off-bag samples and 10,000 iterations to verify the final hyperparameter.

4 experimental evidence

In this section, we report and discuss the main results of the analysis. In particular, we discuss the results of the restricted and non-limited models. These results are evaluated based on standard classification error metrics: accuracy, f1 score, accuracy, and recall.

4.1 Hyperparameters of restricted models

We briefly discuss the fine-tuning of the hyperparameters of the four deep learning algorithms mentioned in Section 3.5, taking into account the frequency per hour. Table 20 shows the best results obtained for different neural network models using grid search technology in terms of classification error metrics. Table 20 lists the best identification parameters and related results for the MALSTM-FNC and MLP models.

The neural network that obtains the best accuracy is MALSTM-FNC with an average accuracy of 53.7% and a standard deviation of 2.9%. In the implemented machine learning model, the MALSTM-FNC with an average accuracy of 54% and a standard deviation of 2.01% (LSTM gets the same f1 score, but we observe higher variance).

4.2 Hyperparameters of unlimited models

Table 21 shows the best results for classification error metrics obtained by neural network models through grid search technology. The best identification parameters and related results for CNN and LSTM models are shown in Table 21.

The results for the unlimited model show that adding transactions and social media metrics to the model can effectively improve the average accuracy, i.e. prediction error. For all implemented algorithms, this result is consistent, which allows us to exclude that this result is statistical fluctuation, or that it may be an artifact of a specific classification algorithm implemented. The best results for the unlimited model were obtained using the CNN model, with an average accuracy of 87% and a standard deviation of 2.7%.

4.3 Results and discussion

Table 22 shows the results of the time-frequency price change classification task using four deep learning algorithms. This table shows the results of the confined (upper) and unconfined (lower) models. First, it can be noted that for all four deep learning algorithms, the unlimited model outperforms the confined model in terms of accuracy, verboseness, recall, and F1 score. Accuracy ranges from 51% of the confined MLP to 84% of the CNNs and LSTMs.

In fact, the results of these four classifiers are consistent, further confirming that this is not due to statistical fluctuations, but due to a higher predictive unlimited model. For Bitcoin, the highest performance is obtained through the CNN architecture, and for Ethereum it is obtained through LSTM.

We further explored the classification of unlimited models by hourly frequency, considering two sub-models: one sub-model includes technical and social metrics and the other includes all indicators (social, technical and trading). This way, the impact of social and trading metrics on model performance can be clarified. We performed a statistical t-test on the distribution of accuracy, predictions, recalls, and F1 scores of the two unlimited sub-modules, and found that adding social metrics does not significantly improve the unlimited model. Therefore, in Table 22, we omitted unlimited models that included only social and technical metrics.

Table 23 shows the results of four deep learning algorithms for daily frequency price changes. This table shows the results of the confined (upper) and unconfined (lower) models. The unconfined model is further divided into technology-social and technology-social-trading sub-models to better highlight the contribution of social and trading indicators to the model, respectively.

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

5 Threats of effectiveness

In this section, we will discuss potential limitations and threats to the effectiveness of our analysis. First, our analysis focuses on Ethereum and Bitcoin: This may pose a threat to external validity, as analyses of different cryptocurrencies may lead to different results.

Second, the threat to internal validity is related to confounding factors that affect the outcome. Based on empirical evidence, we assume that technology, trading, and social indicators are exhaustive in our model. Nevertheless, this study may have overlooked other factors that may affect price movements.

Finally, the threat of structural validity focuses on how observations accurately describe the phenomenon of interest. The detection and classification of price changes are based on objective data that describe the entire phenomenon. Generally speaking, technical indicators and trading indicators are based on objective data and are usually reliable. Social media indicators are based on experimental measurements obtained by deep learning algorithms trained using public data sets: these data sets may carry intrinsic biases, which in turn translate into emotional and emotional classification errors.

6 Conclusion

In recent literature, there have been many attempts to model and predict the unstable behavior of the price or other market indicators of major cryptocurrencies. Although many research groups are committed to this goal, analysis of cryptocurrency markets remains one of the most controversial and elusive tasks. Several aspects make solving this problem so complex. For example, because of its relative youth, cryptocurrency markets are very active and fast-paced. The emergence of new cryptocurrencies is a conventional event that leads to unexpected and frequent changes in the composition of the market itself. In addition, the high price volatility of cryptocurrencies and their "virtual" nature are also a boon for investors and traders, and a curse of any serious theoretical and empirical model, with great practical significance. Research on such a young market, whose price behavior has not been explored to a large extent, not only in the scientific field, but also fundamentally affects the major players and stakeholders in the crypto market landscape.

In this paper, we aim to evaluate whether adding social and trading indicators to the “classic” technical variables leads to actual improvements in the classification of cryptocurrency price changes (considering hourly and daily frequencies). This goal is to implement and benchmark a wide range of deep learning technologies such as multilayer perceptrons (MLP), multivariate attention long-term short-term memory complete convolutional network (MALSTM-FCN), convolutional neural network (CNN), and long-term short-term memory (LTMS) neural networks. We consider two major cryptocurrencies, Bitcoin and Ethereum, in our analysis, and analyze two models: one is a constrained model that only considers technical indicators, and the other is a non-constrained model that includes social and trading indicators.

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : For the daily frequency classification, we can see that the second day price trends performed better in the general technical indicators individually. The more indicators we add to the model, the greater the performance drop.

Another common result is that the accuracy, accuracy, recall and f1 score of Ethereum's daily price change classification is much better than that of Bitcoin. Our results show that high performance of cryptocurrency price change classification can be achieved through the specific design and fine-tuning of the deep learning architecture.

<<:  William: Is this round of callback over?

>>:  Foreign media: 1,800 Bitcoin ATMs in 45 states in the United States now support deposits and withdrawals of Dogecoin

Recommend

Face shape introduction: diamond-shaped face

Is a diamond-shaped face good? As the name sugges...

Blockchain Account: A Guide to Stocking EOS Accounts

If this year is the year of public blockchain, th...

How to read a woman's face and which parts to look at

Generally speaking, if you want to see a woman...

Research | 2020 Domestic Blockchain Policy Census Report

In recent years, the central and local government...

How is the fortune of people with Tiger Palace face shape?

It is said in the physiognomy book that the Yin P...

What are the faces that like to be close to nature?

Nowadays, technology is developing rapidly and li...

Types of eyebrows and eyebrows that bring good fortune

Eyebrows are a reflection of a person's fortu...

Analysis of moles on a woman's left arm

As one of the traditional physiognomy techniques, ...

What's a good face for a girl?

Having a good face will help our fortune; so, wha...

What kind of people can't express love?

What kind of people can't express love? Some ...