For my advanced database systems course I needed to learn how to take a given relation and functional dependencies, tell the highest normal form and then normalize it up to BCNF. It’s actually not that hard, but there are a lot of pitfalls to watch out for. Here I’m going to show the methods I learned to solve the exam questions. I need to give credit to the mentors and StinaQ for their help and tutorials!

### Visualizing the dependencies

You start with a given relation and its dependencies. For example:

1 2 3 |
R(A,B,C) A → B B → {A,C} |

The trick is to draw boxes in order to visualize the dependencies (ignore the right hand side CK/NP box for now):

Note that B->{A,C} was broken down to B->A and B->C thanks to the decomposition rule.

Let’s try a more tricky one:

1 2 3 4 5 |
R(A,B,C,D,E,F) A → B B → A {B,C} → D C → E |

Note that B and C in this relation is a *composite key* that together determine D. Thus we draw a box around both of them to illustrate this dependency. And don’t forget F, even though nothing is pointing to it!

### Candidate Keys & Non-Primes

It’s time to determine the *candidate keys* (CK:s). Recall that a CK is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

To find the candidate keys:

- Begin by selecting a start box (this can be a composite key too). If there is a box at which no arrow is pointing, this is usually a good place start. Follow the outwards arrows to the next box, and from this box to the next and so forth. The goal is to find find a way that covers all boxes.

- If you can’t cover all boxes, try another starting box. If no single attribute (i.e. box) is able to cover all boxes by itself, the CK needs to include one or more additional attributes.

- There can be many candidate keys, so repeat this until you are sure you haven’t missed anything.

- If no arrow points at a box, the rule is that it must be part of
*all*CK:s. This includes stand-alone boxes like F in example 2.

- A candidate key is a minimal super key.
*Minimal*means that you can’t remove any attribute from the key and still determine all other attributes. Note that you can still have keys of different sizes. For example, {A} and {B,C} in Example 3 below can both be candidate keys. But {A,B,C} would not be minimal since you could remove either {A} or {B,C} from it and still determine the relation.

Now that all CK:s are determined, also write down the* non-primary attributes* (NP). The Non-Primes are simply all attributes that are not part of *any* CK.

### Finding the highest normal form

It’s time to determine the highest normal form of the relation. I will not use the exact formal definitions here, but you should learn them too in order to understand what’s happening here, especially if you need to motivate your answer on an exam!

**Is the relation in 1NF?**Yes, always!

**Is the relation in 2NF?**Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In*any*of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF).

**Is the relation in 3NF?**Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF).

**Is the relation in BCNF?**Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given (e.g. A->B and B->{A,C} in example 1) and check that each key (left hand side of the arrow) is in your list of CKs.

Let’s try this! Here is another example:

1 2 3 4 |
R(A,B,C,D,E) A → {B,C} {B,C} → A,D D → E |

First find the CK:s. We can easily see that {B,C} together determines A and D. Since D also determines E, starting with {B,C} can determine all other attributes and is thus a CK. But wait! Since A determines {B,C} it must be a CK on its own!

Now that we have the CK:s, we see that D and E are not part of any CK, so they are non-primary attributes.

All relations are already in 1NF, so we note this. How about 2NF? There is a composite key {B,C} so we need to check if part of it can determine a non-prime on it’s own? Not a problem; this relation is in 2NF! 3NF say that a non-prime must not determine another non-prime. Here is a problem, since D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Note the circle I draw to indicate the highest NF, e.g. 2NF in the above example. Since 3NF did not pass, the *highest* NF is 2NF. If you don’t do this, it is easy to make a mistake and pick the last one as the highest NF.

### Tips

- When you have finished drawing the boxes, do a final count of all attributes in R() and make sure they are all present as boxes. It’s very easy to forget the ones without any dependencies otherwise!

- Remember that for example a 3NF relation is also in 2NF. If an exam question is “is the relation in 2NF?” and you determine that the
*highest*NF is 3NF, then the answer to the question is true. Don’t mix up “highest” vs “is in”.

- This page allows you to check that you have correctly determined the CKs and normal forms. It’s a bit quirky to use, but very convenient!

In the next post I will continue this tutorial and show how to normalize the relations to make sure they are in BCNF!

*Please note that I do not have time to answer questions about this topic in the comments.*

Pingback: Method for normalizing a relation to BCNF based on functional dependencies | RLV Blog

That was really helpful…Thanks

If CK is minimal, why did you include {B, C} as a CK for example 3?

Both {A} and {B,C} are candidate keys in this example. Even if {A} “looks smaller” than {B,C}, {B,C} is still a minimal CK, because if you were to remove either B or C from {B,C} it would no longer be able to determine the relation. Thus it is minimal. {A,B,C} would not be a minimal key, because you could remove either A or B,C from it and it would still determine the relation. Remember that we can have many CKs (but only one is chosen in the end as the primary key, PK)

very helpful……..Thank you very much….

when Highest Normal form is asked,do we need to check normal form with all keys and select the normal form (or) select the Highest normal form formed by one of it’s keys???

For example, if

A -> BCDEF

BC ->ADEF

B ->F

D ->E is given….

then keys =A,{B,C}.

NF achieved by A is 2NF

NF achieved by {B,C} is 1NF…. so what is Highest Normal form??? 1NF or 2NF is my doubt…..

and some where i saw asking “what is BEST normal form” ….can you clarify this also .

Thanks in advance.

This page gives then answer:

Rule: 2NF: Each non-prime-attributes has to be fully functionally dependent from each candidate key.

But: The non-prime-attribute F is not fully functionally dependent from the candidate-key {B, C} because e.g. {B} also identifies the attribute.

Thus it is in 1NF.

The key here is to divide into primes and non-primes.

Thanks a lot…now I got it right….what about BEST normal form….are best and highest normal forms,both same…..i got doubt because finally,we select one of keys as primary key…so if BEST normal form is asked,we need to choose a key that provides us highest normal form…

Thanks in advance

Good question.

Bestis subjective, so if that is the question, I suggest to ask your teacher to clarify. Theoretically highest is probably always best,but in some practical situations it is better to keep a lower normal form because for example performance issues.okay…Thank you for spending time on my questions….i am learning a lot from your blogs…please keep posting as many as possible.

Thank you,

Very nice artcile and helpfull, keep writting 🙂

Thankyou so much. This is very helpfull.

Can you please explain about redundancy in these examples.

Sorry, but my understanding of redundancy in this context is limited. Perhaps some other reader can help?

Dan, thank you very much for such a great explanation! It is really hard to understand how to find CK in FD, but you made it clear! super!

Thanks… It’s very helpful 🙂

Can you please give me a C code for this procedure!

Sorry, but I have no such code…

Thank you very much for your detail article. Finally did I find something that is really useful and helps me understands clearly about how to determine CK and NF from FDs.

can you help me in finding the normal for relation :

R(a,b,c,d,e)

Functional Dependencies : A->B

BC->E

ED->A

hi prem,

Diagrammatic representation of the FDs is:

A -> B |

C |-> E |

D | -> A

How to find the CKs:

‘A’ uniquely cant determine all the attributes. It needs ‘C’ and ‘D’. Therefore CK is { A, C, D }

Similarly, if we have ‘B’ and ‘C’, we need ‘D’ to determine all other attributes, so CK is { B, C, D }

Finally if we have ‘E’ and ‘D’, we need ‘C’ to determine all other attributes, so CK is { C, E, D }

So we have 3 CKs:

1. { A, C, D}

2. { B, C, D }

3. {C, E , D}

Non prime attributes are ones that are not part of any CK. Therefore they are ‘C’ and ‘D’

Now to check all NFs:

i) Any given relation by definition of a relation is ‘already’ in 1NF

ii) To find out 2NF:

Is there any composite Key? Ans: Yes

Then does any part of our composite keys independently determine any of the non-prime attributes?

Ans: ‘C’ and ‘D’ cannot be independently determined by any part of any CK

So 2NF condition is passed

iii. To find out 3NF:

Does the relation have any non-primes? Ans: Yes

Then, is there a non-prime that determines (point to) another non-prime?

Ans: Neither ‘C’ determines (points to) to ‘D’ nor vice versa.

So 3NF condition is passed

iv. To find out BCNF

In all given FDs, are all the determinants also candidate keys?

No, neither ‘A’ nor ‘BC’ nor ‘ED’ are candidate keys. Therefore BCNF condition does not pass.

Highest NF is 3NF

A->BC

B->D

CD->E

E->A

Its highest normal fom is 3NF or BCNF

CK:-A,BC,CD,E

All attributes in a relation is a key or part of key..then blindly say it is in 3NF.

it’s time for check BCNF,

in BCNF every determinent is CK then check one by one.

A determinent is CK.

B determinent is not a CK is a part of a key.

conclusion:-highest NF is 3NF

Dear Sir

You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.

Thanking You

sorry wrong comment , pardon me

Hey

You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.

Thanking You

Hi Amit,

Thanks for such a deep explanation but i am confuse about non prime attribute as non prime attribute should not should not be part of any candidate key but c and d are part of candidate key so they are non prime.

Super….really helpful…thanks a lot

It’s really helpful, thank you very much.

What if you have a very complex set of FDs which you cannot put in diagram form? What do you do then? Example if you have up to 8 FDs and only one of that 8 has one attribute on the LHS. All others have 2 or 3 attributes. Its going to be really complex drawing the box diagram for such a situation. So what do you do?

If the drawings become too complex, then I think you need to really dig in to the theory behind functional dependencies. There is probably some deep mathematical thinking behind it that will help you to a solution, but I’m not the right guy to ask.

Really it’s helpful

great and very helpful …. thanks for this ..

It’s very helpful.

Thanks a lot !!

my searching ends here…..thanks a lot!!

if D->E in a relation R then E->D also possible?

Yes, one of the axioms is reflection.

F = {AB→C, C→B, BC→D}

Find highest normal form?

i think it is in 3NF

Can u list the candidate keys here?

Thanks for this article DAN…… Keep on posting

what if in a f.d,collection of prime and non prime attribute determines non prime attribute.Which normal form is that?

firstly i would like to thank this blog and Dan sir for a simpler explanation to identify CNs.

i have a dought in ur example 1:, if a->b and b->{a,c} then according to formal definition of 3NF a->b,b->c so a->c. hence this relation should not be in 3NF . please help me out from this confusion. hoping a solution from this blog. thanking you in advance.

The definition strictly says that no non-prime attribute should determine another non-prime attribute.

In this case, we can see that is true.

gr8 work buddy (Y)

I think we have a bit of an error here. In Example 3, we list two different candidate keys. However, they are of different sizes. The candidate key should be the minimally sized one.

Is the candidate key then just “A”?

Thank you very much! Your post helped me to get through the exams. 🙂

R(a,b,c,d,e)

Functional Dependencies : A->B

BC->E

ED->A

How to find a non-prime

Hi, thanks for the article

but why in the second example {A,C,F} is a candidate key ?

can D be determined by {A,C,F} ?

Firstly Thanks a lot for the really helpful article.

I’ve got 2 questions though:

1) In example 2, why is {ACF} a candidate key? A leads to B and C leads to E (and F is included in the Key), but between the 3 of them none independently leads to D and neither do any of them together. The only path to D was through {B,C} collectively, so I’m assuming C alone cant lead to D. Then either:

(i) {B,C,F} is the sole candidate and {ACF} is not valid as they cant lead to D or

(ii) The {ACF} CK should include B or D to solve the issue: as either {A,B,C,F} or {A,C,D,F}

Which leads to my second question:

2) When talking about minimal CKs, you only mentioned situations where ex: CKs are: A and {B,C} then {A,B,C} is not a CK since you can drop either half and the CK still holds true.

But what about when {B,C,F} is a CK and A is not. Can {A,B,C,F} be a CK or is it not valid since you can drop A and still have a valid CK even though vice versa isnt true?

[I really hope my questions make sense to you]

Thanks again!

Very Nice article…….:)

Its really help full article..

superb…. remove my all confusion

The following set of functional dependencies in a relation R(A,B,C,D,E,F) IS:

A->B,

A->C

D->E

AD->F

The relation is 1NF,2NF,3NF or BCNF?

it’s really a helpful article

Is it necessary any given set of functional dependencies is in 1NF?

R(A,B,C,D,E)

AB->DE;C->E;D->C;E->A

find out the canditate keys and is this is 3nf and bcnf?

This is the clearest, most amazing guide to NF’s I’ve come across. Well done, marvelous! 🙂

hello

can you please draw the diagram for: AB->CD , ABC->E,C->E

AND please explain the candidate keys

That was a very nice article.It definitely helped me to understand how to determine CK.

1)How can you make a relation to be in 2NF or 3NF when previously it was not in those forms.2)How to determine primary key and all foreign keys based on the functional dependencies.It will be better if you reply me with examples.

R(ABCDEF)

F={AB—D, F—C, E—AF }

What are the candidate key of this relation?

What is the highest normal form of this relation?

Why A is added in the ck since only BC could have determined the relation…

1NF is not automatic. It is not in 1NF if there are multivalued attributes in the table…

Very helpful for me, to determine normal Form by any given relationship & related dependencies.

Thanks alot. It really means a lot.

You saved my life, I am eternally grateful.