+4 votes
250 views
in Security by (242k points)
reopened
Hash tables: quick access to hash values ​​from the database

1 Answer

+5 votes
by (1.6m points)
edited
 
Best answer

What principle does a hash table follow?
What variants of the hash process are there?
Advantages and disadvantages of hash tables
Hash security
Accelerate database processes

image

Hash tables: quick access to hash values ​​from the database

Where can I find the chapter that interests me in a book? Where is Ana García's request going? Where can I buy that watch with the brown leather strap? What these phrases have in common, besides being questions, is that they ask about a place: where? where? Another point in common is that all three assume that the object sought exists. This structure of thought can easily be used as an example of what happens in a database..

Imagine an online store with thousands of items and customers. The data for all of them are stored in databases: the customer searches a database for a specific item and places his order; the shipper uses a database to assign the item to the buyer and their postal address. This process involves sorting, adding, and deleting tasks while the order is being placed. To be able to manage them more efficiently , large amounts of data are summarized with elements in common in an address position in the database. This position is calculated with hash values and is made up of a table with combinations of figures and letters, all of the same length. In this article we explain the basics so you can use tables hash ( hash tables ).

Index
  1. What principle does a hash table follow?
    1. Hash security
    2. Accelerate database processes
  2. What variants of the hash process are there?
  3. Advantages and disadvantages of hash tables

What principle does a hash table follow?

A hash value is first calculated from the data in a record . The hash values of all records in a database are stored in the hash table . By means of another mathematical operation, the location of said information in the database is calculated from the hash value . If the user then enters a term in the search field , this term is also hashed . From then on, therefore, the search is no longer for? Watch with the brown leather strap ?, but a match between the initially hashed value for the item and the hash value of the term searched at that time. In other words, a match is sought between two combinations of numbers and letters. This approach is used for all kinds of applications..

Hash security

A hash table is the result of assigning automatically generated hashes to certain search terms. For this, a hash function is used, which generates a sequence of characters with a constant length called hash . The length of the sequence and the characters that comprise it are established by the hash method that is being used. This method is used, for example, to protect access data against data theft.

image
In a WordPress database, the passwords of registered users are stored in the form of 34-character hashes. From these values, it is not possible to find out the password.

For logins recorded in the image database, the password entered by the user is converted to a hash value using the same procedure. This value is then checked against the corresponding entry in the user_pass field of the database. If both 34-character hashes match, access is granted. In the opposite sense, it is not possible to decrypt a password from a hash value : this is one of the qualities of hash functions ..

If the unauthorized access attempt is based on a brute force attack, the sequence should be tested using all the allowed characters until an exact match is found. If lowercase, uppercase, and numbers are included , and assuming the attacker knows the hash function used, a 10-character password would require exactly 839,299,365,868,340,200 attempts to find the correct combination.

Accelerate database processes

In the database tables are used hash to speed up the search process, registration and deletion of data sets. Let's imagine that a last name is searched in a database that includes all the workers of a company: the task can take a long time because, in a conventional way, each of the fields in the database is searched for a match, one after another (sequentially). If, however, the search term becomes a value hash and then look at the table hash , the process usually take much less time.

How exactly is this done? Each data set is assigned a unique address . The rules that govern this assignment are always the same within each database (001, 002, 003 or 00A1, 00A2, 00A3, etc.). The address is calculated using the hash function .

As a simple example , let's say we have a database with space for 11 entries, in positions 0 to 10. The name Lisa is made up of 4 ASCII characters, each with an ASCII code: L is 76, i is 105, s is 115 and a is 97. You can check this assignment yourself using the numeric keypad: you will see that with [Alt] + 0076, for example, an L is displayed . Then all the ASCII values of Lisa are added , resulting in hash value 393. In this case, the sum of the ASCII values ​​corresponds to a hash function .

This is followed by a calculation of the remainder with integers : 393% 11 (entries that fit in the base) = 35, remainder 8 (in many programming languages ​​the percentage symbol % corresponds to the mathematical operator of the calculation of the remainder). This remainder will determine where in the database (in our example calculation, index number 8 ) Lisa and all the data related to her will be stored . As you may have already guessed, in this type of addressing, the resulting residuals are often repeated. The more storage space and the longer the hash value used, the lower the probability that such repetitions will occur. In the case of the name Alis , even if the letters of the name match those of Lisa, the positioning would be different, since A is uppercase and L is lowercase.

image
Each of the terms stored in the database contains an index value, calculated from the hash value using the hash function. Thus, when searching the data for a person in the database, only the indexes 001? 363? and not the entire database.

In the previous image we can see a repetition problem : it is possible that different names receive the same index values, thus causing so-called collisions (light blue arrows). How does the database react to this obstacle? Locate the record in the closest free position. Thus, if Berta Torres is searched , for example, the search will not start at position 001, but the search pointer will go directly to the index position 003, which means a (slight) acceleration of the process. This method, which is then searched in a linear fashion, is known as open-addressed hashing (or linear probing).

Meanwhile, the method of the hashing chained ( chaining ) works differently: not open any new data set, but goes concatenating the first data set matching to the next. In this way, the searched data set is located with a single search request, since it is located after the first set in the memory location. As a consequence, the processing becomes faster.

image
Chained storage sorts, one after another, under the same index, the data sets that collide, that is, it stores them chained to each other.

When searching for Berta Torres , the data pointer is thus placed from the beginning on the value 003, calculated from the hash table , and from then on she only has to examine a chained set of data. In this way, large amounts of data can be grouped and examined in a faster and more sophisticated way.

The method known as caching also uses this principle to retrieve data that has ever been used . This data is stored in temporary memory or cache, from where it is retrieved in case of a match with an activity pattern. This applies, for example, to web pages that have been visited, so that on a second visit they no longer have to be fully loaded from the server, but rather are retrieved much faster from the cache. The cookies also function as a kind of identification comparison indicates which sites the user visited the web.

What variants of the hash process are there?

The hashing process that we have described in this article is also called open or external hashing , since it can store data in the form of chained lists within an infinite space, at least in theory. Although the keys are limited, chaining allows you to process larger amounts of data. The open rating refers to addressing.

In the case of closed hashing , however, the number of keys is limited by the capacity of the table. If you try to store more data, takes a call overflow or overflow . With each new examination, the table is polled to find free positions in which to locate the overflowed elements .

Note

There are no clear rules that define the meaning of the terms open and closed to qualify hashing processes . In some publications they are even used in the opposite way to what we describe here. It is therefore advisable to refer to a detailed description of each concept.

On the other hand, the so-called cuckoo hash or cuckoo hashing is also applied to avoid collisions in the database table. This method owes its name to the behavior of the cuckoo, which removes the eggs from other people's nests to place its own in them. Similarly, the cuckoo hash applies two hash functions to define two storage locations . If the first position is already occupied, the key located there will be moved to another position, so that the second generated key can be placed in the first position. The disadvantage of this variant is that it can generate an endless search loop, so that a started routine would be interrupted because it would exceed the allotted time.

When it comes to polling the database, there are different methods built on the basis of complex mathematical formulas and hidden in the form of program code on a web page, for example, behind the search bar with the icon of the magnifying glass.

In general, databases become increasingly complex; the amount of data they comprise is increasing faster and faster. For this reason, dynamic hashing increases the hash table to avoid collisions. This, however, causes the hashes of already stored data to be modified as well . To solve this problem efficiently, special hash functions have been developed . With this, the data storage capacity becomes, at least in theory, unlimited , although the search cycles become less efficient.

Advantages and disadvantages of hash tables

The biggest advantage of using a hash table is search speed over large data sets . To achieve this, however, database architects face the challenge of estimating the required size in advance, in order to reduce the risk of collisions. Many different data types can be used , as long as hashes can be calculated from them.

As for the weak points of this method, one of them is the possible serious degeneration of the system if many collisions occur . The probability of collisions increases as the amount of data stored increases. The presence of a large number of hash functions prevents movement from one data set to the previous or the next.


...