Data goes through different forms before it gets converted into useful insights. We take a look at examples of raw data. This will help us appreciate the underpinning raw material of analytics. It will also help us in understanding different types of data and how they have to be worked upon. This is the first step in using data analysis for decision-making.
Contents
The raw data
Raw data is a collection of facts, figures, and other information that is available for analysis. This data is called raw because it has not been worked upon. Overall, there are three stages of data in analytics: raw data, processed data, and analyzed data.
Data collection
The first step in data analytics is the collection of data. The data that we collect is typically raw data. We can have different sources of raw data. For example, we can collect data from primary sources like interviews or surveys. However, we can also have secondary sources of data like government, statistics, and some publicly available data.
In certain cases, we may be interested in analyzing the data that is present in a database. In such cases, we have to do the collection of data by running SQL queries. This data is also an example of raw data.
Why is raw data important?
Raw data is important because it is the first step in the journey of data from potentially useful to a useful form. A lot of attention has to be given before the data is collected. Once the data has been collected, we are limited by the quality of data that has been collected. If the data collection is not done properly, then we are stuck with less-than-optimal raw data. This could lead to less-than-optimal results at the end of the analysis process.
Secondly, raw data is also important because it can offer some insights on its own. There are times when analysts can find patterns in the raw data itself. One of the ways in which pattern can be found in raw data is by running specific queries in SQL that gives a summary result of the database. However, more commonly raw data can be analyzed in Excel itself. Sometimes we are able to make sense of the data by looking at the tables and their values in Excel.
Thirdly, raw data is also important as a trend-setter for later stages of analysis. Care should be taken at this stage to represent and record the data effectively. Correct data entry will help the process later on by eliminating the possibility of errors. Some of the errors of data visualization occur during the raw data management stage only.
What are the types of raw data?
There are many different types of raw data. However, for our purposes, we shall define raw data in terms of the data collection procedure. Broadly, we can say that there are two types: qualitative data and quantitative data.
Qualitative Data | Quantitative Data |
Interview recordings | Tables extracted from databases |
Transcripts | Survey responses captured via scales |
Videos | Employee performance data |
Examples of Raw Data
Survey of weights of students in a classroom
In this example, we look at the distribution of the weight of students in a classroom. In this example, the weights represent the raw data. This data can be captured or arranged to produce a meaningful output. For example, we can find the average of all the weights to find the average weight of a student in the classroom. We can also find the maximum weight of a student and the minimum weight of a student from this table.
Student number | Weight |
1 | 61 |
2 | 79 |
3 | 90 |
4 | 53 |
5 | 79 |
6 | 102 |
7 | 45 |
8 | 67 |
9 | 105 |
10 | 73 |
11 | 72 |
12 | 85 |
13 | 81 |
14 | 77 |
15 | 98 |
An example of Sales Data
When we collect data from a small sample based upon a small number of variables, that data is typically small data. Although there is a lot of impetus in recent years on big data, even small data could be quite useful for analysis. It is much faster to collect this type of data and it is much easier to process this data. A smaller data set can also be processed manually by the analyst. The table below is one of the simplest examples of raw data.
SKU | Item name | Brand | Price | Sales |
300 | Writex 100 | Benolys | $2.50 | 100 |
301 | Writex 200 | Benolys | $3.50 | 80 |
302 | Phantom ball | Benolys | $3.00 | 35 |
Raw audio data from interview recordings
Archived data formats | Contemporary formats |
Reel to Reel tapes | Wav Resolution: (usually 16 or 24bit and sampled at rates between 44.1khz to 192khz) Bitrate: 1411kbps for CD quality resolution (16bit 44.1Khz) |
Audio Cassettes | FLAC Resolution: (usually 16 or 24bit and sampled at rates between 44.1khz to 192khz) Bitrate: 500 to 2500kbps |
Digital or digitized recordings | Mp3 Resolution: (most commonly 16bit 44.1khz) Bitrate: Usual bitrate may be between 64kbps to 320kbps |
Other formats like microtapes, etc | Other formats like Ogg, AMR, etc are more suited for smaller file sizes |
One of the examples of raw data is the audio files that we get from recording interviews. Please check an example of a recorded audio interview below. This reference file is taken from the US Library of Congress.
Raw video data
Another example of raw qualitative data is raw video data. Video data can also be available in different formats and sizes. Usually, there’s no data that is captured from a camera is high bitrate. It needs a lot of space for archival. However, raw data captured from smaller or consumer-grade video recorders and smartphones may already be compressed.
Archived data formats | Contemporary video formats |
Film reels 8mm, 16mm and 35mm are the most common sizes. | Lossless digital video formats Common formats are Apple Prores, DNxHD, and GoPro Cineform. These files have large file sizes and are editing-friendly. Good for archival. |
Video Tapes | RAW video Not to be confused with the raw captured video. Raw is also a method of capturing video with more flexibility of editing later. However, it is not so useful for research purposes. It takes a lot of space. It is more suited if the video needs to be edited and released. |
A digitized version of analog video | Compressed video Common formats are H.264 and H.265 (HEVC). If the video needs to be manually analyzed or transcoded then this is the most suitable format. Small file sizes and easy to handle, store and playback. |
Social Media data as raw data
Social media analysis is another important way to understand market patterns and get some user insights. However, it is also more challenging due to the following aspects:
- Social Media Data is an example of raw data that is unstructured. Unstructured data is difficult to analyze.
- It is also difficult to capture social media data except for platforms like Twitter and Youtube.
- A lot of social media data may not be useful. Therefore it takes more effort to find useful patterns.
What are the common pitfalls with Raw Data?
- More data is not always better
- Starting with raw data rather than with hypothesis
- Trying to find answers in raw data
- Not handling raw data properly