The first main requirement of Big Data is data storage. When you reach this size, it’s hard to design a monolithic architecture that can house all the information. Distributed solutions which allow unified access to information sources are thus required. In many Internet applications, these data must also be quickly stored and processed to offer analytics in real time. The nature and structure of the data, which in these cases are often rather heterogeneous, should also be taken into account. In most cases, solutions based on non-relational databases (NoSQL) adapt better to this scenario than traditional databases.
Once a solution has been provided for storage of and access to large data volumes, many applications offer the possibility of analysing them. Distributed data analysis technologies, such as Hadoop and MapReduce, offer this functionality, providing many possibilities for application such as the following.
Recommendation systems: they use the behavior information for each user to predict his or her intentions and interests, thus offering suitable contents. They are used very frequently in e-commerce.
Sentiment analysis: on the basis of public conversations (e.g. Twitter, forums) and other 2.0 elements, the tastes and behaviour of each user is predicted for various purposes
Catastrophe prediction: the large volumes of data available are used to detect such events as fires or earthquakes, so that their impact can be predicted and an early response generated.
Games: Deep Blue (chess) and Watson (questions) are examples of programs that analyse large volumes of game data to defeat human adversaries.
Categorisation and recognition: places, faces, or people, by analysing large volumes of this type of data available online.
Medicine: personalised genomic medicine (still in the field of research) analyses and integrates genomic and clinical data for early diagnosis and better application of therapies.
Smart behaviour of public services: using the information obtained from data compiled by smart sensors, the distribution and consumption of crucial resources, such as water and electricity, can be improved.
Risk modelling: some banks and cutting-edge investment companies use technologies for the analysis of large data volumes to determine the risk of operations, evaluation a large number of hypothetical financial scenarios.
Fraud detection: using techniques to combine user behaviour databases and transactional data, fraudulent activity can be detected, such as use of a stolen credit card.
Network monitoring: server networks yield a large amount of data that can be analysed to identify bottlenecks or attacks. This kind of analysis can also be applied to other types of networks, such as transport networks, e.g. in order to optimise fuel consumption.
Research and development: some companies with a strong research component, such as pharmaceuticals, analyse large numbers of documents (e.g. scientific papers) and other historical data to improve the development of their products.