DEC 27, 2021
Top Cryptocurrency Datasets for Data Scientists
NOV 29, 2021
The purpose of this article is to discuss information around three datasets regarding cryptocurrency, so that you can practice data science with a more relevant topic and its data. Of course, as a disclaimer, this is not financial advice, this article is simply an aggregation of usable datasets that are open-source for you to play around with. A lot of the time, you will start learning data science with the S&P500 stocks dataset, but with the popular emergence of crypto, it is time to include more discussion around this type of data. With that being said, let’s look closer into these datasets so you can know what to use for your next project.
This first dataset actually is more of a combination of datasets, 50, to be exact. Like typical stock datasets, this one includes the expected open, high, low, and close prices per day. The great part about this dataset is that it also has a combined dataset, in addition to separate company datasets, if you would like to focus on crypto in general, versus a specific company. All of these datasets are good for practicing time-series problems and predictions.
Here is the link to this dataset: TOP 50 Cryptocurrencies Historical Prices 
Date range: 2017–07–10 to 2021–08–23
The dataset could be useful by practicing predicting a numeric, continuous value, like the price, either at open, high, low. You could also look at predicting the volume of transactions in a data, or the percentage change from the previous day as an exercise. This dataset has great values already, so most likely, you will not need to perform any transformations.
This next dataset is similar, but contains fewer cryptos/companies, while also having the benefit of more updated information. At the time of this article, the most recent date is November 2nd, 2021, which is about two months more up-to-date than the previous dataset. This data is separated by different CSV’s for each cryptocurrency, so, if you want to combine them all, you will need to make note of the name from the file, and create that as an index or a new column.
Here is the link to this dataset: Top 10 Cryptocurrencies Historical Dataset 
Date range: 2017–11–08 to 2021–11–01
As you can imagine, you could perform similar tasks with this previous dataset as you could with this one. You could also predict the high and low prices of the day, or create a new feature, which could be the total change from high to low of a given day. With this dataset, you might want to transform the values to be easier to work with, like volume and the date, to be in a DateTime format, and a numeric format, respectively.
This dataset is the least up-to-date, however, it is still pretty recent with the latest data being from two months ago. The dataset is also composed of one CSV file already, which can be a plus.
Here is the link to this dataset: Historical data on the trading of cryptocurrencies 
Date range: 2016–01–01 to 2021–08–09
As you can see, there are a few more columns in this dataset when compared to the other two in this article. Not only that, but there are also new columns in general, specifically the Market Cap, Capitalization Change 1 Day, BTC Price Change 1 Day, and Crypto Type. With these new columns, you could use either of them as your target variable. The value data types look great as well, and you will most likely not need to transform them.
Overall, there are always going to be a ton of datasets, whether it is traditional stocks or cryptocurrencies, but I hope this aggregation and summary can help you quickly decide on a dataset to use. Depending on what you are wanting to predict, or the data you should have to transform or not transform, these are all still great datasets for practicing data science modeling.
* TOP 50 Cryptocurrencies Historical Prices* Top 10 Cryptocurrencies Historical Dataset* Historical data on the trading of cryptocurrencies
I hope you found my article both interesting and useful. Please feel free to comment down below if you agree or disagree with these datasets that I included. Why or why not? What other datasets do you think are important to include? These can certainly be clarified even further, but I hope I was able to shed some light on some interesting datasets for cryptocurrency data. Thank you for reading!