Search This Blog

Friday, May 22, 2009

Ever Hear of Netezza

Netezza performance is something that is out of this world, for someone like me coming from a T-SQL and relational background. I think even if you had considerable BI experience with CUBES and ETL processes your experience would not be considered complete if you haven't dealt with data in the terabytes.
I think i am a bit lucky to be working on this.

Consider some of the technology behind Netezza server. It is a server rack with 28 computers each having its own memory, sql application, processor. Each server has a fail over mechanism for an alternate server.
It has a concept of bringing the query to the data, that means that query is not loaded into memory but fetched directly from the disk, so i guess that would mean A big okay for table scans and no Indexes?
Rather than moving data into memory or across the network for processing, filters records as they stream off the disk, delivering only the relevant information for each query.


Netezza also uses F.P.G.A (Field programmable Gate Array) at the processor level. So this means you can flash the processor with a snippet of code and it then gets optimized since it gets built as a machine language instruction.
This to me is the coolest concept i have ever heard of when it comes to DW servers.

Wednesday, April 29, 2009

Deals site

My favorite deal site, good aggregator. http://www.deals2buy.com/

Friday, April 24, 2009

ANYthing will do as EveryThing or All

I must admit i have never had to use the ANY and ALL keywords in my queries and i could not find any simple explanation. So here is my understanding of what differentiates them.
ANY, compares the result to ANY values
ALL, compares the result to ALL values.








Product CostProduct Name
5Apples
10Orange
20Banana


Query using ALL keyword will return
SELECT [Product Name] FROM Product WHERE [Product Cost] >= ALL
(SELECT MAX ([Product Cost]) FROM Product)
This query will return a value that is greater than the max value.

Below sample explanation is from MSDN which breaks it down nicely.
For example, >ALL (1, 2, 3) means greater than 3. >ANY means greater than at least one value, that is, greater than the minimum. So >ANY (1, 2, 3) means greater than 1.

Joining NTEXT Fields

Tables cannot be joined directly on ntext, text, or image columns. However, tables can be joined indirectly on ntext, text, or image columns by using SUBSTRING. For example, SELECT * FROM t1 JOIN t2 ON SUBSTRING(t1.textcolumn, 1, 20) = SUBSTRING(t2.textcolumn, 1, 20) performs a two-table inner join on the first 20 characters of each text column in tables t1 and t2. In addition, another possibility for comparing ntext or text columns from two tables is to compare the lengths of the columns with a WHERE clause, for example: WHERE DATALENGTH(p1.pr_info) = DATALENGTH(p2.pr_info)

Source MSDN - http://msdn.microsoft.com/en-us/library/ms191517.aspx

Thursday, April 23, 2009

Sql Server Optimization Steps

You can definitely find a lot of articles out on the internet showing different techniques for optimization of Sql server. What follows is actual implementation or rather sequence of steps that i did to tune the database for improved performance and speed.

The first step is to find which of the queries are slow performing. Most of the times it is only after the client or end user starts complaining that the application is slow.
The right way to go about this is to think about performance when you go about designing the database, tables, stored procedures and indexes.

SETTING AND CHOOSING INDEXES


There are only two types of indexes available Clustered and Non clustered indexes.
There can only be one clustered index and that is how the actual data will be stored in the database. You can have several non-clustered indexes according to MSDN the maximum number is around 249 indexes.
To give a real world explanation of clustered and non-clustered index, think of a yellow book. The index at the end is the non-clustered index, it gives a pointer to the where you can find the actual data.

Some of the golden rules while going about optimization are

(1)Selectivity A good rule for non-clustered index is the percentage of records that are unique, the higher the value the better the selectivity. For example a column of type boolean will have good selectivity since it will have a 50% unique selectivity on the account of possible values being True or False.
Keep in mind that setting a non-clustered index might not always be better since it will have to perform an actual lookup on the clustered index to find the actual data.
If the index is going to perform multiple lookups then there can be instances where an actual table scan might perform better.

(2)Exception to Selectivity When foreign keys come into picture the rule of selectivity can be thrown out of the window. Why foreign keys and not other columns is because it improves the join performance as it performs a MERGE JOIN internally. More on the different types of joins later.
This brings up the point that selectivity is not everything,and the question you should ask is how often is this field going to be used.

(3)Index decrease performance Though a well designed index speeds up performance they can be slow while trying to update data. Anytime data is modified any index related to that data also needs to be updated.

(4)Don't Accept the Default Clustered index when you define a key as primary by default SQL Server sets this column as a clustered index.
A clustered index should be defined on a column that participates in a range query for example a column that takes part in a between, <, >, MAX, MIN and COUNT aggregates. The reason why this works well is because the query can do directly to the data and keep reading till it gets to the end.

(5)Keep Index Keys Short
The larger an index key is, the harder a database has to work to use the index. For instance, an integer key is smaller in size then a character field for holding 100 characters. In particular, keep clustered indexes as short as possible.
There are several approaches to keeping an index key short. Try to limit the index to as few columns as possible. While composite indexes are useful and can sometimes optimize a query, they are also larger and cause more disk reads for the database. Choose a data type for an index column with less number of bytes required for each data type. Integer keys are small and easy for the database to compare. In contrast, strings require a character-by-character comparison.

As a rule of thumb, try to avoid using character columns in an index. Integer columns will always have an advantage over character fields in ability to boost the performance of a query.

INDEX MAINTENANCE


Keep in mind that while you go about setting indexes the problem of maintenance for indexes can be something that can cause you nightmares down the line after the database has grown significantly. Two main things to consider are Index fragmentation and Page Splits.

(1)Page Splits

COMING UP - FRAGMENTATION

COMING UP - GUUID EXAMPLE FOR BAD FRAGMENTATION