How do database indexes work/How to Create database index
Let’s start out our tutorial
and explanation of why you would need a database index by going through a very
simple example
SUPPOSE we have a database table named EMP as:
desc emp
Name Null Type
-------- -------- ------------
EMPNO NOT NULL
NUMBER(4)
ENAME
VARCHAR2(10)
JOB
VARCHAR2(9)
MGR
NUMBER(4)
HIREDATE
DATE
SAL NUMBER(7,2)
COMM
NUMBER(7,2)
DEPTNO
NUMBER(2)
And we query as:
SELECT * FROM emp WHERE ename =
'KING'
OUTPUT:
EMPNO
ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------------------- ---------- ---------
---------------------- ------------------------- ----------------------
---------------------- ----------------------
7839
KING PRESIDENT 17-11-81 5000
10
What
would happen without an index on the table?
Once we run that query, what exactly goes on behind the scenes
to find employees who are named KING? Well, the database software would
literally have to look at every single row in the Employee table to see if the Enamefor
that row is ‘KING’. And, because we want every row with the name ‘KING’ inside
it, we can not just stop looking once we find just one row with the name
‘KING’, because there could be other rows with the name KING. So, every row up
until the last row must be searched – which means thousands of rows in this
scenario will have to be examined by the database to find the rows with the
name ‘KING’. This is what is called a FULL
TABLE SCAN....
How a
database index can help performance..
The whole point of having an index is to speed up search queries
by essentially cutting down the number of records/rows in a table that need to
be examined.
What is an index?
So, what is an index? Well, an index is a data structure (most
commonly a B- tree) that stores the values for a specific column in a table. An
index is created on a column of a table. So, the key points to remember
are that an index consists of column values from one table, and that those
values are stored in a data structure. The index is a data structure – remember
that.
How does a hash table index work?
Hash tables are another data structure that you may see being
used as indexes – these indexes are commonly referred to as hash indexes. The
reason hash indexes are used is because hash tables are extremely efficient
when it comes to just looking up values. So, queries that compare for equality
to a string can retrieve values very fast if they use a hash index. For
instance, the query we discussed earlier (SELECT * FROM Employee WHERE Ename= ‘KING’)
could benefit from a hash index created on the Ename column. The way a hash
index would work is that the column value will be the key into the hash table
and the actual value mapped to that key would just be a pointer to the row data
in the table. Since a hash table is basically an associative array, a typical
entry would look something like “KING => 0×28939″, where 0×28939 is a
reference to the table row where KING is stored in memory. Looking up a value
like “KING” in a hash table index and getting back a reference to the row in
memory is obviously a lot faster than scanning the table to find all the rows
with a value of “KING” in the Ename column.
The disadvantages of a hash index
Hash tables are not sorted data structures, and there are many types
of queries which hash indexes can not even help with. For instance, suppose you
want to find out all of the employees who are less than 40 years old. How could
you do that with a hash table index? Well, it’s not possible because a hash
table is only good for looking up key value pairs – which means queries that
check for equality (like “WHERE name = ‘KING’”). What is implied in the key
value mapping in a hash table is the concept that the keys of a hash table are
not sorted or stored in any particular order.
What are some other types of indexes?
Indexes that use a R- tree data structure are commonly used to
help with spatial problems. For instance, a query like “Find all of the
Starbucks within 2 kilometers of me” would be the type of query that could show
enhanced performance if the database table uses a R- tree index.
What exactly is inside a database index?
So, now you know that a database index is created on a column in
a table, and that the index stores the values in that specific column. But, it
is important to understand that a database index does not store the values in
the other columns of the same table. For example, if we create an index on the Ename
column, this means that the Job and emp_no or dept_no column values are not also stored in the index. If we did just
store all the other columns in the index, then it would be just like creating
another copy of the entire table – which would take up way too much space and
would be very inefficient.
How does a database know when to use an index?
When a query like “SELECT * FROM Employee WHERE Ename= ‘KING’ ”
is run, the database will check to see if there is an index on the column(s)
being queried. Assuming the Ename column does have an index created on it, the
database will have to decide whether it actually makes sense to use the index
to find the values being searched – because there are some scenarios where it
is actually less efficient to use the database index, and more efficient just
to scan the entire table.
Can you force the database to use an index on a query?
Generally, you will not tell the database when to actually use
an index – that decision will be made by the database itself. Although it is
worth noting that in most databases (like Oracle and MySQL), you can actually
specify that you want the index to be used.
How to create an index in SQL:
Here’s what the actual SQL would look like to create an index on
the Employee_Name column from our example earlier:
CREATE INDEX name_index
ON Emp (Ename)
How to create a multi-column index in SQL:
We could also create an index on two of the columns in the
Employee table , as shown in this SQL:
CREATE INDEX name_index
ON Employee (ENAME, HIREDATE)
What is the cost of having a database index?
So, what are some of the disadvantages of having a database
index? Well, for one thing it takes up space – and the larger your table, the
larger your index. Another performance hit with indexes is the fact that
whenever you add, delete, or update rows in the corresponding table, the same
operations will have to be done to your index. Remember that an index needs to
contain the same up to the minute data as whatever is in the table column(s)
that the index covers.
As a general rule, an index should only be created on a table if
the data in the indexed column will be queried frequently.