What Data Science and Big Data Can Tell About Bitcoin

Cryptocurrency is a type of digital money that is registered on decentralized, encrypted electronic ledgers. Bitcoin is the earliest invented cryptocurrency (in 2008 by Satoshi Nakamoto) and has been in trading since 2009 [1]. Cryptocurrency analytics from Statista.com showed that the Bitcoin market price achieved 45,604 USD on May 17, 2021, which was a rapid growth from 196 USD in October 2013 [2].

Bitcoin market price from Oct 2013 to May 17, 2021, shown on Statista.com [2].

The vast amount of data generated from crypto transactions can be utilized for making automated investment decisions. Talking about automation, we have to mention artificial intelligence (AI), data science, and big data. AI focuses on using algorithms to learn from data and make automated decisions. Data science is an interdisciplinary field that uses machine learning to generate actionable insights, visualizations, and reports from data. Big data is a research field that focuses on creating adequate software and hardware infrastructure to handle the five V’s of mass data: volume, velocity, variety, veracity, and value.

It seems beneficial to apply these technologies in the data-intensive crypto space — for example, to perform cryptocurrency data analysis and predict trends. In this article, we would like to share some useful resources we found about the application of AI in cryptocurrency, mainly referenced from the systematic reviews of Hassani et al. [1 and 15]. We will take a general look at how cryptocurrency is benefitting from big data analytics and data science.

Why Does One Want to Use Cryptocurrency?

After the premiere of Bitcoin, the crypto market has been booming with lots of other cryptocurrencies. As of May 15, 2021, there are 5,129 cryptocurrencies in the market, as displayed on Coinbase.com [3]. Bitcoin and Ethereum are listed as the top two market-capitalized cryptos, as displayed on Coinbase.com [3], Statista.com [4], and Coinmarketcap.com [5]. Dogecoin is interestingly being made popular by tweets from Tesla CEO Elon Musk [6]. Some of us might think: why does one want to use cryptocurrency?

The top four market-capitalized cryptos are Bitcoin, Ethereum, Bitcoin Cash, and Litecoin, as displayed on Coinbase.com on May 17, 2021 [3].

Cryptocurrency is created as a trusted and decentralized alternative to our current financial systems. Every crypto transaction is recorded and secured by blockchain, a technology similar to bank balance sheets that constantly verify transactions and make all verified changes permanent [7]. The verifications are done via computational work (solving algorithms) instead of via human examination. No bank institutions or governments are involved as mediators. Instead, crypto blockchains are distributed and shared with all holders in the network, forming a decentralized system that enables all participants to have full control over their assets. Now, let’s see how big data analytics can be useful for this.

Data-Driven Crypto Analyses and Predictions

A straightforward utilization of big data and data science in the crypto space is to perform cryptocurrency analytics. Big data infrastructure can handle the massive volume of cryptocurrency data generated from transactions. Data science techniques can generate useful investment insights and predict future outcomes. By taking transaction data for analysis, it is possible to identify the price fluctuation of any given crypto (doing Bitcoin future predictions, for example), enabling investors to improve profitability and prevent substantial losses. In addition, crypto forecasting can also be trained using social-based data. Information like user activities and participation, combined with transaction data, current market price, and computational powers, better prediction on the market volatility over time can be generated [8].

AI Cryptocurrency Forecasting

Here are some attempts to forecast crypto price and market volatility using AI cryptocurrency: 

  • Peng et al. applied support vector regression with the GARCH model to determine the volatility of cryptocurrencies. The implementation returned good predictions for both low and high volatility frequencies [9].
  • Jang and Lee applied Bayesian neural networks on blockchain data to predict Bitcoin price. The implementation performed well with low error rates [10].
  • McNally et al. applied Bayesian long short-term memory networks to predict Bitcoin price tendency [11].
  • Nakano et al. applied neural networks to predict the intraday trading of Bitcoin [12].

It is possible to use big data analytics to predict market volatility [13] and reliable implementations can be further developed as crypto trading systems. As Wall Street carries out almost 90% of the trading volume algorithmically [14], similar approaches can be applied in cryptocurrency. Nevertheless, these data-driven, automated crypto trading services are yet to operate at the level of human experts [15].

Blockchain Security Enhancement with Big Data Analytics

With blockchain technology, a decentralized crypto network enables safe, irreversible peer-to-peer data sharing. However, a govern-less system naturally attracts cybercrimes, thus it must be protected with consistent effort. Artificial intelligence cryptocurrency can be implemented to enhance the security of crypto networks, such as identifying fraudulent user behavior, preventing thefts, and avoiding information leaks [15]. By proactively uncovering anomalous patterns and activities in the blockchain, cryptocurrency can be maintained as a secure place to carry out transactions.

Proactive Security Check and Monitoring

Here are some attempts to strengthen the security of crypto blockchain by implementing data-driven cryptocurrency analytics:

  • Sun Yin and Vatrapu applied supervised classification on 853 transactions to identify cybercriminal entities in Bitcoin. The implementation returned 80% accuracy for more than 100 thousand unlabeled transactions [16]. 
  • Jourdan et al. applied gradient boosted decision tree algorithm to categorize unlabeled entities into groups of exchange, service, gambling, and mining pool. The implementation returned high accuracy scores (0.88-0.97 F1 scores) for the former three categories, while the mining pool category only had an accuracy score of 0.67 [17].
  • Di Francesco Maesa et al. applied clustering heuristics on transaction graph data to detect suspicious chains of crypto transactions [18].
  • Dey applied algorithmic game theory and machine learning methods to prevent majority attack activities in the blockchain [19].
  • Akcora et al. applied topological data analysis to cluster address graphs to identify suspicious addresses related to ransomware payment in Bitcoin [20].

Data Science for Risk Management

Having mentioned data science and crypto for market price prediction and blockchain security enhancement, let us now look at how AI can be applied to monitor and manage crypto risks. The volatility of cryptocurrencies can be subjected to external factors, such as community sentiment, blockchain forking, regulatory updates, the rise of competitors, technological advancements, major social events, and marketing campaigns. Combining data from these sources with transaction data for analysis, it is possible to predict potential changes in cryptocurrencies.

Two events usually increase the Bitcoin price significantly: forking and halving.  Before the year 2012, 50 Bitcoins were rewarded to blockchain miners upon every successful Bitcoin block validation [21]. In May 2021, the rewards were halved to 6.25 Bitcoins per validated block. Forking means a cryptocurrency blockchain is divided into two, forming separate records of transactions. These naturally affect the price and market capitalization of any involved crypto.

Crypto and Bitcoin Visualization for Market Movements

Here are some attempts to visualize the influence of crypto-related events using data science and big data analytics to manage investment risks: 

  • Colianni et al. applied text classification models (support vector machines, logistic regression, and Naïve Bayes) on crypto-related tweets to develop informative algorithmic trading strategies. The implementation returned more than 90% prediction accuracy on determining crypto market movement [22].
  • Colianni et al. applied sentiment analysis on tweets relating to the short-time overreaction tendency of crypto investors to predict price volatility [22].
  • Kim et al. applied sentiment classification on user comments in some online crypto communities to predict the price and transaction volume of cryptocurrencies [23].
  • Lu et al. applied sentiment analysis on social media data to determine the factors of Bitcoin adoption in Taiwan [24].
  • Phillips and Gorse applied sentiment analysis on a combination of cryptocurrency topics, including Reddit posts, trading volume, Google search volume, and Wikipedia views, to predict price bubbles of cryptocurrencies [25].
  • Karalevicius et al. applied sentiment analysis on news articles and blog posts to determine the influence of media sentiment on Bitcoin price [26].

The crypto market is heavily affected by public opinions [27]. Since social media is a hotspot for all kinds of opinions and views, it is an ideal place to scrape public sentiments regarding cryptocurrencies. These sentiments can be analyzed and displayed as general crypto or Bitcoin visualization, showing how the crypto market is influenced by public opinions. Research [28] found out that bullish forum posts positively influence the Bitcoin returns on a daily frequency and micro-blogs influence the Bitcoin market on an hourly frequency.

Intelligent Fraud Detection System

The swiftly developing crypto market is constantly challenged by fraud attempts. CipherTrace [29] reported that incidents of crypto hack and thefts, scams, and fraud in the first four months of 2021 alone caused a loss of 432 million USD — it was 4.5 billion USD for 2019 and 1.9 billion USD for 2020.  Initial Coin Offering (ICO) scams are aimed at users who are not well-informed in crypto knowledge. An ICO firm named Statis Group reported that over 80% of ICOs in 2017 were recognized as scams [30].

The amount of money lost by cryptocurrency fraud in 2019, 2020, and the first four months of 2021, as shown on CipherTrace [29].

Here are some attempts to detect fraud and scams using cryptocurrency data analysis:

  • Bian et al. applied text mining using 2,251 crypto analyses to identify ICO scam projects. The implementation returned 83% precision [31].
  • Xu and Livshits applied machine learning to identify pump and dump indicators in cryptocurrencies [32].
  • Bartoletti et al. applied data classification to determine Ponzi scheme indicators on the Bitcoin network [33].
  • Chen et al. applied data classification to determine Ponzi scheme indicators on the Ethereum network [34].

Final Remarks

In this article, we shared some case studies and publications about artificial intelligence cryptocurrency, with a focus on Bitcoin analytics. Applications of data science and big data analytics in cryptocurrency are discussed in three parts: analyses and predictions, blockchain security enhancement, and risk management.

The crypto space is a relatively new topic compared to AI. Although there are numerous possibilities of how data-driven approaches can help advance cryptocurrency operations, it seems that the key determinant of crypto value fluctuations is public sentiment — for now. We are curious about how this technology will develop in the future.

If you find this crypto analytics article interesting, feel free to share it and spark a conversation with your networks. Please consider subscribing to our newsletters so that we can share our latest data science insights with you.

python for financial analysis


[1]H. Hassani, X. Huang and E. Silva, “Big-Crypto: Big Data, Blockchain and Cryptocurrency”, Big Data and Cognitive Computing, vol. 2, no. 4, p. 34, 2018. Available: 10.3390/bdcc2040034.

[2]R. de Best, “Bitcoin price history 2013-2021 | Statista”, Statista, 2021. [Online]. Available: https://www.statista.com/statistics/326707/bitcoin-price-index/. [Accessed: 17- May- 2021].

[3]”Cryptocurrency Price”, Coinbase.com, 2021. [Online]. Available: https://www.coinbase.com/price. [Accessed: 17- May- 2021].

[4]R. de Best, “Market cap of several cryptocurrencies 2020 | Statista”, Statista, 2021. [Online]. Available: https://www.statista.com/statistics/730782/cryptocurrencies-market-capitalization/. [Accessed: 17- May- 2021].

[5]”All Cryptocurrencies | CoinMarketCap”, CoinMarketCap, 2021. [Online]. Available: https://coinmarketcap.com/all/views/all/. [Accessed: 17- May- 2021].

[6]R. de Best, “Topic: Cryptocurrencies”, Statista, 2021. [Online]. Available: https://www.statista.com/topics/4495/cryptocurrencies/#dossierSummary. [Accessed: 17- May- 2021].

[7]”What is Cryptocurrency?”, Coinbase.com, 2021. [Online]. Available: https://www.coinbase.com/learn/crypto-basics/what-is-cryptocurrency. [Accessed: 17- May- 2021].

[8]X. Li and C. Wang, “The technology and economic determinants of cryptocurrency exchange rates: The case of Bitcoin”, Decision Support Systems, vol. 95, no. 1, pp. 49-60, 2017. Available: 10.1016/j.dss.2016.12.001.

[9]Y. Peng, P. Albuquerque, J. Camboim de Sá, A. Padula and M. Montenegro, “The best of two worlds: Forecasting high frequency volatility for cryptocurrencies and traditional currencies with Support Vector Regression”, Expert Systems with Applications, vol. 97, no. 1, pp. 177-192, 2018. Available: 10.1016/j.eswa.2017.12.004.

[10]H. Jang and J. Lee, “An Empirical Study on Modeling and Prediction of Bitcoin Prices With Bayesian Neural Networks Based on Blockchain Information”, IEEE Access, vol. 6, no. 1, pp. 5427-5437, 2018. Available: 10.1109/access.2017.2779181.

[11]S. McNally, J. Roche and S. Caton, “Predicting the Price of Bitcoin Using Machine Learning”, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 339-343, 2018. Available: 10.1109/PDP2018.2018.00060 [Accessed 17 May 2021].

[12]M. Nakano, A. Takahashi and S. Takahashi, “Bitcoin Technical Trading With Artificial Neural Network”, SSRN Electronic Journal, vol. 510, pp. 587-609, 2018. Available: 10.2139/ssrn.3128726.

[13]”How 11 Trends Indicate that AI is the Future of Cryptocurrency Trading”, Hackernoon.com, 2018. [Online]. Available: https://hackernoon.com/how-11-trends-indicatethat-ai-is-the-future-of-cryptocurrency-trading-a38c0437450d. [Accessed: 17- May- 2021].

[14]H. Argawal, “Autonio review: How Does Autonio Works?”, CoinSutra, 2019. [Online]. Available: https://coinsutra.com/autonio-ai-trading-bot/. [Accessed: 17- May- 2021].

[15]H. Hassani, X. Huang and E. Silva, Fusing Big Data, Blockchain and Cryptocurrency. Cham: Palgrave Macmillan UK, 2020, pp. 77-98.

[16]H. Sun Yin and R. Vatrapu, “A first estimation of the proportion of cybercriminal entities in the bitcoin ecosystem using supervised machine learning”, 2017 IEEE International Conference on Big Data (Big Data), pp. 3690-3699, 2017. Available: 10.1109/BigData.2017.8258365 [Accessed 17 May 2021].

[17]M. Jourdan, S. Blandin, L. Wynter and P. Deshpande, “A Probabilistic Model of the Bitcoin Blockchain”, CoRR, vol. 181205451, 2018. Available: http://arxiv.org/abs/1812.05451. [Accessed 17 May 2021].

[18]D. Di Francesco Maesa, A. Marino and L. Ricci, “Detecting artificial behaviours in the Bitcoin users graph”, Online Social Networks and Media, vol. 3-4, pp. 63-74, 2017. Available: 10.1016/j.osnem.2017.10.006.

[19]S. Dey, “A Proof of Work: Securing Majority-Attack in Blockchain Using Machine Learning and Algorithmic Game Theory”, International Journal of Wireless and Microwave Technologies, vol. 8, no. 5, pp. 1-9, 2018. Available: 10.5815/ijwmt.2018.05.01.

[20]C. Akcora, A. Dey, Y. Gel and M. Kantarcioglu, “Forecasting Bitcoin Price with Graph Chainlets”, PAKDD 2018: Advances in Knowledge Discovery and Data Mining, pp. 765-776, 2018. [Accessed 17 May 2021].

[21]”#1 Bitcoin Halving 2024 Countdown & Date ETA (BTC Clock)”, Bitcoinclock.com. [Online]. Available: https://www.bitcoinclock.com. [Accessed: 17- May- 2021].

[22]S. Colianni, S. Rosales and M. Signorotti, “Algorithmic Trading of Cryptocurrency Based on Twitter Sentiment Analysis”, Cs229.stanford.edu, 2015. [Online]. Available: http://cs229.stanford.edu/proj2015/029_report.pdf. [Accessed: 17- May- 2021].

[23]Y. Kim et al., “Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies”, PLOS ONE, vol. 11, no. 8, p. e0161197, 2016. Available: 10.1371/journal.pone.0161197.

[24]H. Lu, L. Yang, P. Lin, T. Yang and A. Chen, “A Study on Adoption of Bitcoin in Taiwan: Using Big Data Analysis of Social Media”, Association for Computing Machinery, 2017. Available: https://doi.org/10.1145/3162957.3163046. [Accessed 17 May 2021].

[25]R. Phillips and D. Gorse, “Predicting cryptocurrency price bubbles using social media data and epidemic modelling”, 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-7, 2017. Available: 10.1109/SSCI.2017.8280809 [Accessed 17 May 2021].

[26]V. Karalevicius, N. Degrande and J. De Weerdt, “Using sentiment analysis to predict interday Bitcoin price movements”, The Journal of Risk Finance, vol. 19, no. 1, pp. 56-75, 2018. Available: 10.1108/jrf-06-2017-0092.

[27]D. Garcia and F. Schweitzer, “Social signals and algorithmic trading of Bitcoin”, Royal Society Open Science, vol. 2, no. 9, p. 150288, 2015. Available: 10.1098/rsos.150288.

[28]F. Mai, Q. Bai, J. Shan, X. Wang and R. Chiang, “The Impacts of Social Media on Bitcoin Performance”, ICIS, 2015. [Accessed 17 May 2021].

[29]”Cryptocurrency Crime and Anti-Money Laundering Report, May 2021″, CipherTrace, 2021. [Online]. Available: https://ciphertrace.com/cryptocurrency-crime-and-anti-money-laundering-report-may-2021/. [Accessed: 17- May- 2021].

[30]S. Dowlat, “Cryptoasset market coverage initiation: Network creation”, Research.bloomberg.com, 2018. [Online]. Available: https://research.bloomberg.com/pub/res/d28giW28tf6G7T_Wr77aU0gDgFQ. [Accessed: 17- May- 2021].

[31]S. Bian et al., “IcoRating: A Deep-Learning System for Scam ICO Identification”, CoRR, vol. 180303670, 2018. Available: http://arxiv.org/abs/1803.03670. [Accessed 17 May 2021].

[32]J. Xu and B. Livshits, “The Anatomy of a Cryptocurrency Pump-and-Dump Scheme”, CoRR, 2019. Available: https://arxiv.org/abs/1811.10109. [Accessed 17 May 2021].

[33]M. Bartoletti, B. Pes and S. Serusi, “Data Mining for Detecting Bitcoin Ponzi Schemes”, 2018 Crypto Valley Conference on Blockchain Technology (CVCBT), pp. 75-84, 2018. Available: 10.1109/CVCBT.2018.00014 [Accessed 17 May 2021].

[34]W. Chen, Z. Zheng, E. Ngai, P. Zheng and Y. Zhou, “Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum”, IEEE Access, vol. 7, pp. 37575-37586, 2019. Available: 10.1109/access.2019.2905769.


A million students have already chosen Ligency

It’s time for you to Join the Club!