My Datastage Notes: March 2013

Monday, March 18, 2013

Unix Interview Questions

1. How to display the 10th line of a file?
head -10 filename | tail -1

2. How to remove the header from a file?
sed -i '1 d' filename

3. How to remove the footer from a file?

sed -i '$ d' filename

4. Write a command to find the length of a line in a file?
The below command can be used to get a line from a file.
sed –n '<n> p' filename
We will see how to find the length of 10th line in a file
sed -n '10 p' filename|wc -c

5. How to get the nth word of a line in Unix?
cut –f<n> -d' '

6. How to reverse a string in unix?
echo "java" | rev

7. How to get the last word from a line in Unix file?
echo "unix is good" | rev | cut -f1 -d' ' | rev

8. How to replace the n-th line in a file with a new line in Unix?
sed -i'' '10 d' filename # d stands for delete
sed -i'' '10 i new inserted line' filename # i stands for insert

9. How to check if the last command was successful in Unix?
echo $?

10. Write command to list all the links from a directory?
ls -lrt | grep "^l"

11. How will you find which operating system your system is running on in UNIX?
uname -a

12. Create a read-only file in your home directory?
touch file; chmod 400 file

13. How do you see command line history in UNIX?
The 'history' command can be used to get the list of commands that we are executed.

14. How to display the first 20 lines of a file?
By default, the head command displays the first 10 lines from a file. If we change the option of head, then we can display as many lines as we want.
head -20 filename
An alternative solution is using the sed command
sed '21,$ d' filename
The d option here deletes the lines from 21 to the end of the file

15. Write a command to print the last line of a file?
The tail command can be used to display the last lines from a file.
tail -1 filename
Alternative solutions are:
sed -n '$ p' filename
awk 'END{print $0}' filename

16. How do you rename the files in a directory with _new as suffix?
ls -lrt|grep '^-'| awk '{print "mv "$9" "$9".new"}' | sh

17. Write a command to convert a string from lower case to upper case?
echo "apple" | tr [a-z] [A-Z]

18. Write a command to convert a string to Initcap.
echo apple | awk '{print toupper(substr($1,1,1)) tolower(substr($1,2))}'

19. Write a command to redirect the output of date command to multiple files?
The tee command writes the output to multiple files and also displays the output on the terminal.
date | tee -a file1 file2 file3

20. How do you list the hidden files in current directory?

ls -a | grep '^\.'

21. List out some of the Hot Keys available in bash shell?

Ctrl+l - Clears the Screen.
Ctrl+r - Does a search in previously given commands in shell.
Ctrl+u - Clears the typing before the hotkey.
Ctrl+a - Places cursor at the beginning of the command at shell.
Ctrl+e - Places cursor at the end of the command at shell.
Ctrl+d - Kills the shell.
Ctrl+z - Places the currently running process into background.

22. How do you make an existing file empty?
cat /dev/null > filename

23. How do you remove the first number on 10th line in file?
sed '10 s/[0-9][0-9]*//' < filename

24. What is the difference between join -v and join -a?
join -v : outputs only matched lines between two files.
join -a : In addition to the matched lines, this will output unmatched lines also.

25. How do you display from the 5th character to the end of the line from a file?
cut -c 5- filename

26. Display all the files in current directory sorted by size?
ls -l | grep '^-' | awk '{print $5,$9}' |sort -n|awk '{print $2}'
Write a command to search for the file 'map' in the current directory?
find -name map -type f
How to display the first 10 characters from each line of a file?
cut -c -10 filename
Write a command to remove the first number on all lines that start with "@"?
sed '\,^@, s/[0-9][0-9]*//' < filename
How to print the file names in a directory that has the word "term"?
grep -l term *
The '-l' option make the grep command to print only the filename without printing the content of the file. As soon as the grep command finds the pattern in a file, it prints the pattern and stops searching other lines in the file.
How to run awk command specified in a file?
awk -f filename
How do you display the calendar for the month march in the year 1985?
The cal command can be used to display the current month calendar. You can pass the month and year as arguments to display the required year, month combination calendar.
cal 03 1985
This will display the calendar for the March month and year 1985.
Write a command to find the total number of lines in a file?
wc -l filename
Other ways to pring the total number of lines are
awk 'BEGIN {sum=0} {sum=sum+1} END {print sum}' filename
awk 'END{print NR}' filename
How to duplicate empty lines in a file?
sed '/^$/ p' < filename
Explain iostat, vmstat and netstat?

Iostat: reports on terminal, disk and tape I/O activity.
Vmstat: reports on virtual memory statistics for processes, disk, tape and CPU activity.
Netstat: reports on the contents of network data structures.

27. How do you write the contents of 3 files into a single file?
cat file1 file2 file3 > file

28. How to display the fields in a text file in reverse order?
awk 'BEGIN {ORS=""} { for(i=NF;i>0;i--) print $i," "; print "\n"}' filename

29. Write a command to find the sum of bytes (size of file) of all files in a directory.
ls -l | grep '^-'| awk 'BEGIN {sum=0} {sum = sum + $5} END {print sum}'

30. Write a command to print the lines which end with the word "end"?
grep 'end$' filename
The '$' symbol specifies the grep command to search for the pattern at the end of the line.

31. Write a command to select only those lines containing "july" as a whole word?
grep -w july filename
The '-w' option makes the grep command to search for exact whole words. If the specified pattern is found in a string, then it is not considered as a whole word. For example: In the string "mikejulymak", the pattern "july" is found. However "july" is not a whole word in that string.

32. How to remove the first 10 lines from a file?
sed '1,10 d' < filename

33. Write a command to duplicate each line in a file?
sed 'p' < filename

34. How to extract the username from 'who am i' comamnd?
who am i | cut -f1 -d' '

35. Write a command to list the files in '/usr' directory that start with 'ch' and then display the number of lines in each file?
wc -l /usr/ch*
Another way is
find /usr -name 'ch*' -type f -exec wc -l {} \;

36. How to remove blank lines in a file ?
grep -v ‘^$’ filename > new_filename

37. How to display the processes that were run by your user name ?
ps -aef | grep <user_name>

38. Write a command to display all the files recursively with path under current directory?
find . -depth -print

39. Display zero byte size files in the current directory?
find -size 0 -type f

40. Write a command to display the third and fifth character from each line of a file?
cut -c 3,5 filename

41. Write a command to print the fields from 10th to the end of the line. The fields in the line are delimited by a comma?
cut -d',' -f10- filename

42 How to replace the word "Gun" with "Pen" in the first 100 lines of a file?
sed '1,00 s/Gun/Pen/' < filename

43. Write a Unix command to display the lines in a file that do not contain the word "RAM"?
grep -v RAM filename
The '-v' option tells the grep to print the lines that do not contain the specified pattern.

44 How to print the squares of numbers from 1 to 10 using awk command
awk 'BEGIN { for(i=1;i<=10;i++) {print "square of",i,"is",i*i;}}'

45. Write a command to display the files in the directory by file size?
ls -l | grep '^-' |sort -nr -k 5

46. How to find out the usage of the CPU by the processes?
The top utility can be used to display the CPU usage by the processes.

47. Write a command to remove the prefix of the string ending with '/'.
The basename utility deletes any prefix ending in /. The usage is mentioned below:
basename /usr/local/bin/file
This will display only file

48. How to display zero byte size files?
ls -l | grep '^-' | awk '/^-/ {if ($5 !=0 ) print $9 }'

49. How to replace the second occurrence of the word "bat" with "ball" in a file?
sed 's/bat/ball/2' < filename

50. How to remove all the occurrences of the word "jhon" except the first one in a line with in the entire file?
sed 's/jhon//2g' < filename

51. How to replace the word "lite" with "light" from 100th line to last line in a file?
sed '100,$ s/lite/light/' < filename

52. How to list the files that are accessed 5 days ago in the current directory?
find -atime 5 -type f

53. How to list the files that were modified 5 days ago in the current directory?
find -mtime 5 -type f

54. How to list the files whose status is changed 5 days ago in the current directory?
find -ctime 5 -type f

55. How to replace the character '/' with ',' in a file?
sed 's/\//,/' < filename
sed 's|/|,|' < filename

56. Write a command to find the number of files in a directory.
ls -l|grep '^-'|wc -l

57. Write a command to display your name 100 times.
The Yes utility can be used to repeatedly output a line with the specified string or 'y'.
yes <your_name> | head -100

58. Write a command to display the first 10 characters from each line of a file?
cut -c -10 filename

59. The fields in each line are delimited by comma. Write a command to display third field from each line of a file?
cut -d',' -f2 filename

60. Write a command to print the fields from 10 to 20 from each line of a file?
cut -d',' -f10-20 filename

61. Write a command to print the first 5 fields from each line?
cut -d',' -f-5 filename

62. By default the cut command displays the entire line if there is no delimiter in it. Which cut option is used to supress these kind of lines?
The -s option is used to supress the lines that do not contain the delimiter.

63. Write a command to replace the word "bad" with "good" in file?
sed s/bad/good/ < filename

64. Write a command to replace the word "bad" with "good" globally in a file?
sed s/bad/good/g < filename

65. Write a command to replace the word "apple" with "(apple)" in a file?
sed s/apple/(&)/ < filename

66. Write a command to switch the two consecutive words "apple" and "mango" in a file?
sed 's/$apple$ $mango$/\2 \1/' < filename

67. Write a command to display the characters from 10 to 20 from each line of a file?
cut -c 10-20 filename

68. Write a command to print the lines that has the the pattern "july" in all the files in a particular directory?
grep july *
This will print all the lines in all files that contain the word “july” along with the file name. If any of the files contain words like "JULY" or "July", the above command would not print those lines.

69. Write a command to print the lines that has the word "july" in all the files in a directory and also suppress the filename in the output.
grep -h july *

70. Write a command to print the lines that has the word "july" while ignoring the case.
grep -i july *
The option i make the grep command to treat the pattern as case insensitive.

71. When you use a single file as input to the grep command to search for a pattern, it won't print the filename in the output. Now write a grep command to print the filename in the output without using the '-H' option.
grep pattern filename /dev/null
The /dev/null or null device is special file that discards the data written to it. So, the /dev/null is always an empty file.
Another way to print the filename is using the '-H' option. The grep command for this is
grep -H pattern filename

72. Write a command to print the file names in a directory that does not contain the word "july"?
grep -L july *
The '-L' option makes the grep command to print the filenames that do not contain the specified pattern.

73. Write a command to print the line numbers along with the line that has the word "july"?
grep -n july filename
The '-n' option is used to print the line numbers in a file. The line numbers start from 1

74. Write a command to print the lines that starts with the word "start"?
grep '^start' filename
The '^' symbol specifies the grep command to search for the pattern at the start of the line.

75. In the text file, some lines are delimited by colon and some are delimited by space. Write a command to print the third field of each line.
awk '{ if( $0 ~ /:/ ) { FS=":"; } else { FS =" "; } print $3 }' filename

76. Write a command to print the line number before each line?
awk '{print NR, $0}' filename

77. Write a command to print the second and third line of a file without using NR.
awk 'BEGIN {RS="";FS="\n"} {print $2,$3}' filename

78. How to create an alias for the complex command and remove the alias?
The alias utility is used to create the alias for a command. The below command creates alias for ps -aef command.
alias pg='ps -aef'
If you use pg, it will work the same way as ps -aef.
To remove the alias simply use the unalias command as
unalias pg

79. Write a command to display todays date in the format of 'yyyy-mm-dd'?
The date command can be used to display todays date with time
date '+%Y-%m-%d'

------------------------------------------------------------------------------------------------------

1) Convert single column to single row:

Input: filename : try

REF_PERIOD
PERIOD_NAME
ACCOUNT_VALUE
CDR_CODE
PRODUCT
PROJECT
SEGMENT_CODE
PARTNER
ORIGIN
BILLING_ACCRUAL

Output:
REF_PERIOD PERIOD_NAME ACCOUNT_VALUE CDR_CODE PRODUCT PROJECT SEGMENT_CODE PARTNER ORIGIN BILLING_ACCRUAL

Command: cat try | awk ‘{printf “%s “,$1}’

2) Print the list of employees in Technology department :

Now department name is available as a fourth field, so need to check if $4 matches with the string “Technology”, if yes print the line.

Command: $ awk ‘$4 ~/Technology/’ employee.txt

200 Jason   Developer Technology $5,500
300 Sanjay Sysadmin   Technology $7,000
500 Randy   DBA        Technology $6,000

Operator ~ is for comparing with the regular expressions. If it matches the default action i.e print whole line will be performed.

3) Convert single column to multiple column :

For eg: Input file contain single column with 84 rows then output should be single column data converted to multiple of 12 columns i.e. 12 column * 7 rows with field separtor (fs ;)

Script:

#!/bin/sh
rows=`cat input_file | wc -l`
 cols=12
 fs=;
awk -v r=$rows -v c=$cols -v t=$fs '
 NR<r*c{printf("%s",NR%c?$0"$":$0"\n");next}{print}
 END{if(NR%c&&NR<r*c){print ""}}' input_file > output_file

4) Last field print:

input:
a=/Data/Files/201-2011.csv

output:
201-2011.csv

Command: echo $a | awk -F/ ‘{print $NF}’

5) Count no. of fields in file:

file1: a, b, c, d, 1, 2, man, fruit

Command: cat file1 | awk ‘BEGIN{FS=”,”};{print NF}’

and you will get the output as:8

6) Find ip address in unix server:

Command: grep -i your_hostname /etc/hosts

7) Replace the word corresponding to search pattern:

 >cat file 
 the black cat was chased by the brown dog.
 the black cat was not chased by the brown dog.

 >sed -e '/not/s/black/white/g' file 
 the black cat was chased by the brown dog. 
 the white cat was not chased by the brown dog.

8) The below i have shown the demo for the “A” and “65″.

Ascii value of character: It can be done in 2 ways:

1. printf “%d” “‘A”
2. echo “A” | tr -d “\n” | od -An -t dC

Character value from Ascii: awk -v char=65 ‘BEGIN { printf “%c\n”, char; exit }’

———————————————————————————————————
9) Input file:

crmplp1 cmis461 No Online
cmis462 No Offline
crmplp2 cmis462 No Online
cmis463 No Offline
crmplp3 cmis463 No Online
cmis461 No Offline

Output –>crmplp1 cmis461 No Online cmis462 No Offline
crmplp2 cmis462 No Online cmis463 No Offline

Command:
awk ‘NR%2?ORS=FS:ORS=RS’ file

———————————————————————————————————

10) Variable can used in AWK

awk -F”$c” -v var=”$c” ‘{print $1var$2}’ filename

———————————————————————————————————

11) Search pattern and use special character in sed command:

sed -e ‘/COMAttachJob/s#”)#.”:JobID)#g’ input_file———————————————————————————————————

12) Get the content between two patterns:sed -n ‘/CREATE TABLE table/,/MONITORING/p’ table_Script.sql———————————————————————————————————

13) Pring debugging script output in log file Add following command in script:
exec 1>> logfilename
exec 2>>logfilename———————————————————————————————————

14) Check Sql connection:#!/bin/sh
ID=abc
PASSWD=avd
DB=sdf
exit | sqlplus -s -l $ID/$PASSWD@$DB
echo variable:$?
exit | sqlplus -s -L avd/df@dfg > /dev/null
echo variable_crr: $?———————————————————————————————————

15) Trim the spaces using sed command

echo “$var” | sed -e ‘s/^[[:space:]]*//’ -e ‘s/[[:space:]]*$//’
Another option is:
Code:
var=$(echo “$var” | sed -e ‘s/^[[:space:]]*//’ -e ‘s/[[:space:]]*$//’)
echo “Start $var End”———————————————————————————————————

16) How to add sigle quote in statement using awk:Input:
/Admin/script.sh abc 2011/08 29/02/2012 00:00:00
/Admin/script.sh abc 2011/08 29/02/2012 00:00:00
command:
cat command.txt | sed -e ‘s/[[:space:]]/ /g’ | awk -F’ ‘ ‘{print \x27″$1,$2,$3″\x27″,”\x27″$4,$5″\x27″}’
output:
‘/Admin/script.sh abc 2011/08′ ’29/02/2012 00:00:00′
‘/Admin/script.sh abc 2011/08′ ’29/02/2012 00:00:00′

Sql queries

Query to display middle records drop first 5 last 5 records in emp table

select * from emp where rownum<=(select count(*)-5 from emp) - select * from emp where rownum<=5;

Query to display first N records

select * from(select * from emp order by rowid) where rownum<=&n;

Query to display odd records only?

Q). select * from emp where (rowid,1) in (select rowid,mod (rownum,2) from emp);

Query to display even records only?

Q.) select * from emp where (rowid,0) in (select rowid,mod (rownum,2) from emp);

How to display duplicate rows in a table?

Q). select * from emp where deptno=any

(select deptno from emp having count(deptno)>1 group by deptno);

Query to display 3rd highest and 3rd lowest salary?

Q). select * from emp e1 where 3=(select count(distinct sal) from emp e2 where e1.sal<=e2.sal)

union

select * from emp e3 where 3=(select count(distinct sal) from emp e4 where e3.sal>=e4.sal);

Query to display Nth record from the table?

Q). select * from emp where rownum<=&n minus select * from emp where rownum<&n;

Query to display the records from M to N;

Q.) select ename from emp group by rownum,ename having rownum>1 and rownum<6;

select deptno,ename,sal from emp where rowid in(select rowid from emp

where rownum<=7 minus select rowid from emp where rownum<4);

select * from emp where rownum<=7 minus select * from emp where rownum<5;

Query to delete the duplicate records?

Q). delete from dup where rowid not in(select max(rowid)from dup group by eno);

Query to display the duplicate records?

Q). select * from dup where rowid not in(select max(rowid)from dup group by eno);

Query for joining two tables(OUTER JOIN)?

Q). select e.ename,d.deptno from emp e,dept d where e.deptno(+)=d.deptno order by e.deptno;

select empno,ename,sal,dept.* from emp full outer join dept on emp.deptno=dept.deptno;

Right Outer Join:

select empno,ename,sal,dept.* from emp right outer join dept on emp.deptno=dept.deptno;

Left Outer Join:

select empno,ename,sal,dept.* from emp left outer join dept on emp.deptno=dept.deptno

Query for joining table it self(SELF JOIN)?

Q). select e.ename “employee name”,e1.ename “manger name” from emp e,emp e1 where e.mgr=e1.empno;

Query for combining two tables(INNER JOIN)?

select emp.empno,emp.ename,dept.deptno from emp,dept where emp.deptno=dept.deptno;

By using aliases:

select e.empno,e.ename,d.deptno from emp e,dept d where e.deptno=d.deptno;

select empno,ename,sal,dept.* from emp join dept on emp.deptno=dept.deptno:

Find the particular employee salary?

for maximum:

select * from emp where sal in(select min(sal)from

(select sal from emp group by sal order by sal desc)

where rownum<=&n);

select * from emp a where &n=(select count(distinct(sal)) from emp b where a.sal<=b.sal);

for minimum:

select * from emp where sal in(select max(sal) from(select sal from emp group by sal order by sal asc) where rownum<=&n);

select * from emp a where &n=(select count(distinct(sal)) from emp b where a.sal>=b.sal)

Find the lowest 5 employee salaries?

Q). select * from (select * from emp order by sal asc) where rownum<6;

Find the top 5 employee salaries queries

select * from (select * from emp order by sal desc) where rownum<6;

Find lowest salary queries

select * from emp where sal=(select min(sal) from emp);

Find highest salary queries

select * from emp where sal=(select max(sal) from emp);

Saturday, March 16, 2013

TRANSFORMER STAGE TO FILTER THE DATA

Take Job Design as below

If our requirement is to filter the data department wise from the file below

samp_tabl
1,sam,clerck,10
2,tom,developer,20
3,jim,clerck,10
4,don,tester,30
5,zeera,developer,20
6,varun,clerck,10
7,luti,production,40
8,raja,priduction,40

And our requirement is to get the target data as below

In Target1 we need 10th & 40th dept employees.

In Target2 we need 30th dept employees.

In Target1 we need 20th & 40th dept employees.

Read and Load the data in Source file

In Transformer Stage just Drag and Drop the data to the target tables.

Write expression in constraints as below

dept_no=10 or dept_no= 40 for table 1

dept_no=30 for table 1

dept_no=20 or dept_no= 40 for table 1

Click ok

Give file name at the target file and

Compile and Run the Job to get the Output

Thursday, March 14, 2013

Shared Container

Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 )if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then.. How can short out the problem.Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition should go director and check it what type of problem showing either data type problem, warning massage, job fail or job aborted, If job fail means data type problem or missing column action .So u should go Run window ->Click-> Tracing->Performance or In your target table ->general -> action-> select this option here two option
(i) On Fail -- commit , Continue
(ii) On Skip -- Commit, Continue.
First u check how many data already load after then select on skip option then continue and what remaining position data not loaded then select On Fail , Continue ...... Again Run the job defiantly u get successful massage
----------------------------------------------------------------------------------------------------------

Question: I want to process 3 files in sequentially one by one how can i do that. while processing the files it should fetch files automatically .
Ans:If the metadata for all the files r same then create a job having file name as parameter then use same job in routine and call the job with different file name...or u can create sequencer to use the job..
--------------------------------------------------------------------------------------------------------------------------------------
Parameterize the file name.
Build the job using that parameter
Build job sequencer which will call this job and will accept the parameter for file name.
Write a UNIX shell script which will call the job sequencer three times by passing different file each time.
RE: What Happens if RCP is disable ?
In such case Osh has to perform Import and export every time whenthe job runs and the processing time job is also increased...
--------------------------------------------------------------------------------------------------------------------
Runtime column propagation (RCP): If RCP is enabled for any job and specifically for those stages whose output connects to the shared container input then meta data will be propagated at run time so there is no need to map it at design time.
If RCP is disabled for the job in such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased.
Then you have to manually enter all the column description in each stage.RCP- Runtime column propagation

Question:
Source: Target

Eno Ename Eno Ename
1 a,b 1 a
2 c,d 2 b
3 e,f 3 c

Tuesday, March 12, 2013

Difference Between Join,Lookup and Merge

Datawarehousing Concepts

According to Ralph Kimball:

A datawarehouse is a specially designed RDBMS. The data stored in this database should be useful to query the business and analyse the business rather than transaction processing.

According to W.H. Inman:

A datawarehouse is a specially designed RDBMS. The data stored in this database should support 4 characteristic features:
1. Subject Oriented-Datawarehouses are designed as a subject oriented that are used to analyze the business by top level management (or) middle level management (or) for individual departments in an enterprise.
The data in OLTP system is stored in such a way that subject oriented attributes stored in different subject areas( sales rep ID stored in sales schema,Product in Product schema )
2. Integrated--It contains business information collected from various operational data source.
If a particular attribute is common among different source systems which is in different format, has to be loaded in a single standardize format in DWH is called intergration
3.Time Variant- A datawarehouse is atime varient database which allows you to analyze and compare the business with respect to various time periods( Year,Quarter,Month,Week,Day)
4. Non-Volatile-A datawarehouse is a non-volatile database that means once the data entered into dwh can not change.

Dimensional Table: A dimensional table consists of textual representation of the business process( Allows browsing categories quickly and easily)

Fact Tables: A fact table typical includes two types of cols facts cols and foreign keys to dimension. It consists of measurements,metrics or facts of a business process.

Dimensions and Facts

Slowly Changing Dimensions:

Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions:

A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions:

A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

Inferred Dimensions:

While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions:

A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Degenerate Dimensions:

A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

Role Playing Dimensions:

A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions:

A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions:

Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

Types of Facts -

Additive:

Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.

Semi-Additive:

Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.

Non-Additive:

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:

In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.

Eg: A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.

Based on the above classifications, fact tables are categorized into two:

Cumulative:

This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.

Snapshot:

This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

DATA MINING

Data mining is the process of finding patterns from large data sets and analyzing data from different perspectives. It allows business users to analyze data from different angles and summarize the relationships identified. Data mining can be useful in increasing the revenue and cut costs.

Example:

In a supermarket, the persons who bought the tooth brush on Sundays also bought tooth paste. This information can be used in increasing the revenue by providing an offer on tooth brush and tooth paste. There by selling more number of products (tooth paste and tooth brush) on Sundays.

Data mining process:

Data mining analyzes relationships and patterns in the stored data based on user queries. Data mining involves four tasks.

Association: Find the relationship between the variables. For example in retail a store, we can determine which products are bought together frequently and this information can be used to market these products.
Clustering: Identifying the logical relationship in the data items and grouping them. For example in a retail store, a tooth paste, tooth brush can be logically grouped.
Classifying: Involves in applying a known pattern to the new data.

Pages