# Method for determining candidate keys and highest normal form of a relation based on functional dependencies

For my advanced database systems course I needed to learn how to take a given relation and functional dependencies, tell the highest normal form and then normalize it up to BCNF. It’s actually not that hard, but there are a lot of pitfalls to watch out for. Here I’m going to show the methods I learned to solve the exam questions. I need to give credit to the mentors and StinaQ for their help and tutorials!

### Visualizing the dependencies

The trick is to draw boxes in order to visualize the dependencies (ignore the right hand side CK/NP box for now):

Example 1

Note that B->{A,C} was broken down to B->A and B->C thanks to the decomposition rule.

Let’s try a more tricky one:

Example 2

Note that B and C in this relation is a composite key that together determine D. Thus we draw a box around both of them to illustrate this dependency. And don’t forget F, even though nothing is pointing to it!

### Candidate Keys & Non-Primes

It’s time to determine the candidate keys (CK:s). Recall that a CK is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

To find the candidate keys:

• Begin by selecting a start box (this can be a composite key too). If there is a box at which no arrow is pointing, this is usually a good place start. Follow the outwards arrows to the next box, and from this box to the next and so forth. The goal is to find find a way that covers all boxes.
• If you can’t cover all boxes, try another starting box. If no single attribute (i.e. box) is able to cover all boxes by itself, the CK needs to include one or more additional attributes.
• There can be many candidate keys, so repeat this until you are sure you haven’t missed anything.
• If no arrow points at a box, the rule is that it must be part of all CK:s. This includes stand-alone boxes like F in example 2.
• A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes. Note that you can still have keys of different sizes. For example, {A} and {B,C} in Example 3 below can both be candidate keys. But {A,B,C} would not be minimal since you could remove either {A} or {B,C} from it and still determine the relation.

Now that all CK:s are determined, also write down the non-primary attributes (NP). The Non-Primes are simply all attributes that are not part of any CK.

### Finding the highest normal form

It’s time to determine the highest normal form of the relation. I will not use the exact formal definitions here, but you should learn them too in order to understand what’s happening here, especially if you need to motivate your answer on an exam!

• Is the relation in 1NF? Yes, always!
• Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF).
• Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF).
• Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given (e.g. A->B and B->{A,C} in example 1) and check that each key (left hand side of the arrow) is in your list of CKs.

Let’s try this! Here is another example:

Example 3

First find the CK:s. We can easily see that {B,C} together determines A and D. Since D also determines E, starting with {B,C} can determine all other attributes and is thus a CK. But wait! Since A determines {B,C} it must be a CK on its own!

Now that we have the CK:s, we see that D and E are not part of any CK, so they are non-primary attributes.

All relations are already in 1NF, so we note this. How about 2NF? There is a composite key {B,C} so we need to check if part of it can determine a non-prime on it’s own? Not a problem; this relation is in 2NF! 3NF say that a non-prime must not determine another non-prime. Here is a problem, since D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Note the circle I draw to indicate the highest NF, e.g. 2NF in the above example. Since 3NF did not pass, the highest NF is 2NF. If you don’t do this, it is easy to make a mistake and pick the last one as the highest NF.

### Tips

• When you have finished drawing the boxes, do a final count of all attributes in R() and make sure they are all present as boxes. It’s very easy to forget the ones without any dependencies otherwise!
• Remember that for example a 3NF relation is also in 2NF. If an exam question is “is the relation in 2NF?” and you determine that the highest NF is 3NF, then the answer to the question is true. Don’t mix up “highest” vs “is in”.
• This page allows you to check that you have correctly determined the CKs and normal forms. It’s a bit quirky to use, but very convenient!

In the next post I will continue this tutorial and show how to normalize the relations to make sure they are in BCNF!

This entry was posted in Tutorials and tagged , , , , . Bookmark the permalink.

### 64 Responses to Method for determining candidate keys and highest normal form of a relation based on functional dependencies

1. Akshay says:

2. Joe says:

If CK is minimal, why did you include {B, C} as a CK for example 3?

• Dan says:

Both {A} and {B,C} are candidate keys in this example. Even if {A} “looks smaller” than {B,C}, {B,C} is still a minimal CK, because if you were to remove either B or C from {B,C} it would no longer be able to determine the relation. Thus it is minimal. {A,B,C} would not be a minimal key, because you could remove either A or B,C from it and it would still determine the relation. Remember that we can have many CKs (but only one is chosen in the end as the primary key, PK)

3. ram says:

4. ram says:

when Highest Normal form is asked,do we need to check normal form with all keys and select the normal form (or) select the Highest normal form formed by one of it’s keys???
For example, if
A -> BCDEF
B ->F
D ->E is given….
then keys =A,{B,C}.
NF achieved by A is 2NF
NF achieved by {B,C} is 1NF…. so what is Highest Normal form??? 1NF or 2NF is my doubt…..
and some where i saw asking “what is BEST normal form” ….can you clarify this also .

• Dan says:

Rule: 2NF: Each non-prime-attributes has to be fully functionally dependent from each candidate key.

But: The non-prime-attribute F is not fully functionally dependent from the candidate-key {B, C} because e.g. {B} also identifies the attribute.

Thus it is in 1NF.

The key here is to divide into primes and non-primes.

• ram says:

Thanks a lot…now I got it right….what about BEST normal form….are best and highest normal forms,both same…..i got doubt because finally,we select one of keys as primary key…so if BEST normal form is asked,we need to choose a key that provides us highest normal form…

• Dan says:

Good question. Best is subjective, so if that is the question, I suggest to ask your teacher to clarify. Theoretically highest is probably always best,but in some practical situations it is better to keep a lower normal form because for example performance issues.

• ram says:

okay…Thank you for spending time on my questions….i am learning a lot from your blogs…please keep posting as many as possible.
Thank you,

5. Aniruddha says:

Very nice artcile and helpfull, keep writting :)

6. Shareena says:

Thankyou so much. This is very helpfull.

• Dan says:

Sorry, but my understanding of redundancy in this context is limited. Perhaps some other reader can help?

7. Ksenia says:

Dan, thank you very much for such a great explanation! It is really hard to understand how to find CK in FD, but you made it clear! super!

8. chiranjiv says:

9. Harshit Mehta says:

Can you please give me a C code for this procedure!

• Dan says:

Sorry, but I have no such code…

10. Duc Cuong says:

Thank you very much for your detail article. Finally did I find something that is really useful and helps me understands clearly about how to determine CK and NF from FDs.

11. prem says:

can you help me in finding the normal for relation :
R(a,b,c,d,e)
Functional Dependencies : A->B
BC->E
ED->A

• amit says:

hi prem,

Diagrammatic representation of the FDs is:
A -> B |
C |-> E |
D | -> A

How to find the CKs:
‘A’ uniquely cant determine all the attributes. It needs ‘C’ and ‘D’. Therefore CK is { A, C, D }
Similarly, if we have ‘B’ and ‘C’, we need ‘D’ to determine all other attributes, so CK is { B, C, D }
Finally if we have ‘E’ and ‘D’, we need ‘C’ to determine all other attributes, so CK is { C, E, D }
So we have 3 CKs:
1. { A, C, D}
2. { B, C, D }
3. {C, E , D}

Non prime attributes are ones that are not part of any CK. Therefore they are ‘C’ and ‘D’

Now to check all NFs:
i) Any given relation by definition of a relation is ‘already’ in 1NF

ii) To find out 2NF:
Is there any composite Key? Ans: Yes
Then does any part of our composite keys independently determine any of the non-prime attributes?
Ans: ‘C’ and ‘D’ cannot be independently determined by any part of any CK
So 2NF condition is passed

iii. To find out 3NF:
Does the relation have any non-primes? Ans: Yes
Then, is there a non-prime that determines (point to) another non-prime?
Ans: Neither ‘C’ determines (points to) to ‘D’ nor vice versa.
So 3NF condition is passed

iv. To find out BCNF
In all given FDs, are all the determinants also candidate keys?
No, neither ‘A’ nor ‘BC’ nor ‘ED’ are candidate keys. Therefore BCNF condition does not pass.

Highest NF is 3NF

• smita says:

A->BC
B->D
CD->E
E->A
Its highest normal fom is 3NF or BCNF

• Hareish says:

CK:-A,BC,CD,E

All attributes in a relation is a key or part of key..then blindly say it is in 3NF.

it’s time for check BCNF,
in BCNF every determinent is CK then check one by one.
A determinent is CK.
B determinent is not a CK is a part of a key.

conclusion:-highest NF is 3NF

• Shaurya says:

Dear Sir

You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.
Thanking You

• Shaurya says:

sorry wrong comment , pardon me

• Shaurya says:

Hey

You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.
Thanking You

• Hi Amit,

Thanks for such a deep explanation but i am confuse about non prime attribute as non prime attribute should not should not be part of any candidate key but c and d are part of candidate key so they are non prime.

12. Marvin P says:

13. snopcoff says:

It’s really helpful, thank you very much.

14. Diamond says:

What if you have a very complex set of FDs which you cannot put in diagram form? What do you do then? Example if you have up to 8 FDs and only one of that 8 has one attribute on the LHS. All others have 2 or 3 attributes. Its going to be really complex drawing the box diagram for such a situation. So what do you do?

• Dan says:

If the drawings become too complex, then I think you need to really dig in to the theory behind functional dependencies. There is probably some deep mathematical thinking behind it that will help you to a solution, but I’m not the right guy to ask.

15. Pravin says:

16. Deepak gautam says:

great and very helpful …. thanks for this ..

17. Manasi says:

Thanks a lot !!

18. Hareish says:

my searching ends here…..thanks a lot!!
if D->E in a relation R then E->D also possible?

• Liliana says:

Yes, one of the axioms is reflection.

19. Harish says:

F = {AB→C, C→B, BC→D}
Find highest normal form?
i think it is in 3NF

• Akhil Kumar says:

Can u list the candidate keys here?

21. suhaib says:

what if in a f.d,collection of prime and non prime attribute determines non prime attribute.Which normal form is that?

22. rashmi says:

firstly i would like to thank this blog and Dan sir for a simpler explanation to identify CNs.
i have a dought in ur example 1:, if a->b and b->{a,c} then according to formal definition of 3NF a->b,b->c so a->c. hence this relation should not be in 3NF . please help me out from this confusion. hoping a solution from this blog. thanking you in advance.

• Vaughan says:

The definition strictly says that no non-prime attribute should determine another non-prime attribute.

In this case, we can see that is true.

23. arpit says:

gr8 work buddy (Y)

24. Vaughan says:

I think we have a bit of an error here. In Example 3, we list two different candidate keys. However, they are of different sizes. The candidate key should be the minimally sized one.

Is the candidate key then just “A”?

25. Romit Nagariya says:

Thank you very much! Your post helped me to get through the exams. :)

26. Anonymous says:

R(a,b,c,d,e)
Functional Dependencies : A->B
BC->E
ED->A

How to find a non-prime

27. Ilja I says:

Hi, thanks for the article
but why in the second example {A,C,F} is a candidate key ?
can D be determined by {A,C,F} ?

28. SN says:

Firstly Thanks a lot for the really helpful article.
I’ve got 2 questions though:
1) In example 2, why is {ACF} a candidate key? A leads to B and C leads to E (and F is included in the Key), but between the 3 of them none independently leads to D and neither do any of them together. The only path to D was through {B,C} collectively, so I’m assuming C alone cant lead to D. Then either:
(i) {B,C,F} is the sole candidate and {ACF} is not valid as they cant lead to D or
(ii) The {ACF} CK should include B or D to solve the issue: as either {A,B,C,F} or {A,C,D,F}

Which leads to my second question:
2) When talking about minimal CKs, you only mentioned situations where ex: CKs are: A and {B,C} then {A,B,C} is not a CK since you can drop either half and the CK still holds true.
But what about when {B,C,F} is a CK and A is not. Can {A,B,C,F} be a CK or is it not valid since you can drop A and still have a valid CK even though vice versa isnt true?

[I really hope my questions make sense to you]
Thanks again!

29. Marvin P says:

Very Nice article…….:)

30. Shubham says:

31. arun pant says:

superb…. remove my all confusion

32. Avinash Kumar says:

The following set of functional dependencies in a relation R(A,B,C,D,E,F) IS:
A->B,
A->C
D->E
The relation is 1NF,2NF,3NF or BCNF?

33. Pawan says:

34. faisal says:

Is it necessary any given set of functional dependencies is in 1NF?

35. aastha says:

R(A,B,C,D,E)
AB->DE;C->E;D->C;E->A
find out the canditate keys and is this is 3nf and bcnf?

36. Gintaras says:

This is the clearest, most amazing guide to NF’s I’ve come across. Well done, marvelous! :)

37. ritesh pandita says:

hello
can you please draw the diagram for: AB->CD , ABC->E,C->E
AND please explain the candidate keys

38. Clara says:

That was a very nice article.It definitely helped me to understand how to determine CK.

39. Clara says:

1)How can you make a relation to be in 2NF or 3NF when previously it was not in those forms.2)How to determine primary key and all foreign keys based on the functional dependencies.It will be better if you reply me with examples.

40. john says:

R(ABCDEF)
F={AB—D, F—C, E—AF }
What are the candidate key of this relation?
What is the highest normal form of this relation?

41. shiva says:

Why A is added in the ck since only BC could have determined the relation…

42. Jo says:

1NF is not automatic. It is not in 1NF if there are multivalued attributes in the table…

43. Ayan Gupta says:

Very helpful for me, to determine normal Form by any given relationship & related dependencies.
Thanks alot. It really means a lot.