+4 votes
in Web analysis by (242k points)
Data mining in eCommerce: from big data to information

1 Answer

+5 votes
by (1.6m points)
Best answer

What is data mining?
Data mining application areas
Data mining methods
Limitations of data mining


Data mining in eCommerce: from big data to information

Data plays a central role in eCommerce. Many online stores collect data on customer habits, shopping carts and products in order to optimize their sales processes. However, a mountain of data alone says little. Turning this raw data into useful information helps improve store operations and increase sales. This is where comes in an analysis tool massive data called data mining or data mining ..

What is data mining?

To understand this concept, which draws from computer science and statistics, it is useful to understand the metaphor contained in the name. If the result of the almost absolute tracking of user behavior on the Internet is viewed as a seemingly useless mountain of data, data mining, which translates as data mining, provides the necessary tools to explore this vast amount of data and extract from it. her relevant information. These tools consist of statistical methods that allow identifying patterns of behavior and connections in data that, by themselves, do not mean anything.

Data mining is often related to big data , a concept that refers to databases whose volume no longer allows conventional analysis and, therefore, relies on computational processes. Through the data mining process, however, any amount of data can be explored..

In reality, data exploration is one of the stages of a larger process, the so-called? Knowledge extraction in databases? ( Knowledge Discovery in Databases or KDD ), which covers the following steps:

  • Choosing the database to analyze
  • Pre-processing that cleans and prepares the database
  • Transformation the way the analysis process needs
  • Process of analysis through a mathematical process (data mining)
  • Results analysis

The information that is extracted by a KDD can be applied to a wide variety of areas, for example, to the strategic planning of an online business and to the making of marketing decisions.


Data mining application areas

Data mining allows you to optimize e-commerce on a scientific basis . The large databases of online stores are the starting point to draw conclusions and forecasts. This data, statistically prepared and visualized in a structured way, allows online store administrators to identify the factors that influence the success of an online business and to recalculate strategies. Data mining is used in this case to:

  • Segment markets
  • Analyze the demand
  • Create buyer profiles
  • Analyze shopping carts
  • Calculate product prices
  • Identify flaws in sales processes
  • Prepare a forecast on the expiration of contracts

Data mining methods

To extract relevant information for companies, different methods have been implemented based on the identification of connections, models and significant patterns and that use procedures typical of statistical sciences:

  • Outlier detection : outlier values ​​are those values ​​that are extremely distant from the rest, since they deviate from a general pattern or trend. In data mining, this analysis is used to identify compelling data that could lead to credit card fraud.
  • Cluster analysis or clustering : a cluster is a group of objects that relies on relationships of mutual resemblance. The objective of this analysis is the segmentation of unstructured data, for which algorithms are used that explore databases in search of semblance structures to identify new clusters. Unlike classification, clustering analysis aims to discover new clustering possibilities. Data that cannot be subordinated to any group can be interpreted as outliers. A very common case of application in eCommerce is the identification of user groups.
  • Classification (discriminant analysis) : while in the previous analysis method the focus is on the identification of new groups, in the discriminant analysis predefined classes are applied. The distribution takes place from characteristics common to individual data. A very common way to automatically classify data consists of decision trees, prediction models used in artificial intelligence to establish logical construction schemes that allow categorizing a series of successive conditions. It works from nodes, in each of which an aspect of the object is fixed. Its constancy or non-constancy in the object decides the choice of the next node (or characteristic of the object). In the field of eCommerce, this data mining procedure is used to segment customers into different groups.
  • Association analysis (association rules) : an analysis of this type seeks to identify connections that can be formulated as an absolute rule. For online stores, this data mining procedure could be applied to identify correlations in a typical shopping cart according to the pattern "customers who buy product A, also buy product B".
  • Regression analysis : with this type of statistical analysis, models can be created that explain a dependent variable from independent variants. In practice, it makes it possible to prepare a forecast for the sale of a product, relating the price of the product and the average salary of the customer in a regression pattern.

Limitations of data mining

Data mining groups statistical methods that allow a fundamentally objective analysis of databases. However, the subjective choice of the type of analysis and of the different algorithms and parameters according to certain objectives, can lead, perhaps in a desired way, to an adulteration of the results. One way to avoid this could be to use an external data mining service..

The state of the database is also critical to the quality of the information extracted. Representative results are only extracted when the data available are also representative. This is why in most cases, before starting the data mining process itself, a previous processing of the database is carried out , eliminating empty spaces and distortions.

Finally, we must not forget that data mining yields the results in the form of patterns and connections . In order to obtain answers, the results must be interpreted based on the questions and previously established objectives.