Method for determining candidate keys and highest normal form of a relation based on functional dependencies

For my advanced database systems course I needed to learn how to take a given relation and functional dependencies, tell the highest normal form and then normalize it up to BCNF. It’s actually not that hard, but there are a lot of pitfalls to watch out for. Here I’m going to show the methods I learned to solve the exam questions. I need to give credit to the mentors and StinaQ for their help and tutorials!

Visualizing the dependencies

You start with a given relation and its dependencies. For example:

The trick is to draw boxes in order to visualize the dependencies (ignore the right hand side CK/NP box for now):

relation1

Example 1

Note that B->{A,C} was broken down to B->A and B->C thanks to the decomposition rule.

Let’s try a more tricky one:

relation2

Example 2

Note that B and C in this relation is a composite key that together determine D. Thus we draw a box around both of them to illustrate this dependency. And don’t forget F, even though nothing is pointing to it!

Candidate Keys & Non-Primes

It’s time to determine the candidate keys (CK:s). Recall that a CK is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

To find the candidate keys:

  • Begin by selecting a start box (this can be a composite key too). If there is a box at which no arrow is pointing, this is usually a good place start. Follow the outwards arrows to the next box, and from this box to the next and so forth. The goal is to find find a way that covers all boxes.
  • If you can’t cover all boxes, try another starting box. If no single attribute (i.e. box) is able to cover all boxes by itself, the CK needs to include one or more additional attributes.
  • There can be many candidate keys, so repeat this until you are sure you haven’t missed anything.
  • If no arrow points at a box, the rule is that it must be part of all CK:s. This includes stand-alone boxes like F in example 2.
  • A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes. Note that you can still have keys of different sizes. For example, {A} and {B,C} in Example 3 below can both be candidate keys. But {A,B,C} would not be minimal since you could remove either {A} or {B,C} from it and still determine the relation.

Now that all CK:s are determined, also write down the non-primary attributes (NP). The Non-Primes are simply all attributes that are not part of any CK.

Finding the highest normal form

It’s time to determine the highest normal form of the relation. I will not use the exact formal definitions here, but you should learn them too in order to understand what’s happening here, especially if you need to motivate your answer on an exam!

  • Is the relation in 1NF? Yes, always!
  • Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF).
  • Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime (resulting in a transitive dependency because the last non-prime is determined indirectly)? If yes, 3NF does not pass (and the relation is thus in 2NF).
  • Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given (e.g. A->B and B->{A,C} in example 1) and check that each key (left hand side of the arrow) is in your list of CKs.

Let’s try this! Here is another example:

relation3

Example 3

First find the CK:s. We can easily see that {B,C} together determines A and D. Since D also determines E, starting with {B,C} can determine all other attributes and is thus a CK. But wait! Since A determines {B,C} it must be a CK on its own!

Now that we have the CK:s, we see that D and E are not part of any CK, so they are non-primary attributes.

All relations are already in 1NF, so we note this. How about 2NF? There is a composite key {B,C} so we need to check if part of it can determine a non-prime on it’s own? Not a problem; this relation is in 2NF! 3NF say that a non-prime must not determine another non-prime. Here is a problem, since D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Note the circle I draw to indicate the highest NF, e.g. 2NF in the above example. Since 3NF did not pass, the highest NF is 2NF. If you don’t do this, it is easy to make a mistake and pick the last one as the highest NF.

Tips

  • When you have finished drawing the boxes, do a final count of all attributes in R() and make sure they are all present as boxes. It’s very easy to forget the ones without any dependencies otherwise!
  • Remember that for example a 3NF relation is also in 2NF. If an exam question is “is the relation in 2NF?” and you determine that the highest NF is 3NF, then the answer to the question is true. Don’t mix up “highest” vs “is in”.
  • This page allows you to check that you have correctly determined the CKs and normal forms. It’s a bit quirky to use, but very convenient!

In the next post I will continue this tutorial and show how to normalize the relations to make sure they are in BCNF!

Please note that I do not have time to answer questions about this topic in the comments.

This entry was posted in Tutorials and tagged , , , , . Bookmark the permalink.

64 Responses to Method for determining candidate keys and highest normal form of a relation based on functional dependencies

  1. Pingback: Method for normalizing a relation to BCNF based on functional dependencies | RLV Blog

  2. Akshay says:

    That was really helpful…Thanks

  3. Joe says:

    If CK is minimal, why did you include {B, C} as a CK for example 3?

    • Dan says:

      Both {A} and {B,C} are candidate keys in this example. Even if {A} “looks smaller” than {B,C}, {B,C} is still a minimal CK, because if you were to remove either B or C from {B,C} it would no longer be able to determine the relation. Thus it is minimal. {A,B,C} would not be a minimal key, because you could remove either A or B,C from it and it would still determine the relation. Remember that we can have many CKs (but only one is chosen in the end as the primary key, PK)

  4. ram says:

    very helpful……..Thank you very much….

  5. ram says:

    when Highest Normal form is asked,do we need to check normal form with all keys and select the normal form (or) select the Highest normal form formed by one of it’s keys???
    For example, if
    A -> BCDEF
    BC ->ADEF
    B ->F
    D ->E is given….
    then keys =A,{B,C}.
    NF achieved by A is 2NF
    NF achieved by {B,C} is 1NF…. so what is Highest Normal form??? 1NF or 2NF is my doubt…..
    and some where i saw asking “what is BEST normal form” ….can you clarify this also .
    Thanks in advance.

    • Dan says:

      This page gives then answer:

      Rule: 2NF: Each non-prime-attributes has to be fully functionally dependent from each candidate key.

      But: The non-prime-attribute F is not fully functionally dependent from the candidate-key {B, C} because e.g. {B} also identifies the attribute.

      Thus it is in 1NF.

      The key here is to divide into primes and non-primes.

      • ram says:

        Thanks a lot…now I got it right….what about BEST normal form….are best and highest normal forms,both same…..i got doubt because finally,we select one of keys as primary key…so if BEST normal form is asked,we need to choose a key that provides us highest normal form…
        Thanks in advance

        • Dan says:

          Good question. Best is subjective, so if that is the question, I suggest to ask your teacher to clarify. Theoretically highest is probably always best,but in some practical situations it is better to keep a lower normal form because for example performance issues.

          • ram says:

            okay…Thank you for spending time on my questions….i am learning a lot from your blogs…please keep posting as many as possible.
            Thank you,

  6. Aniruddha says:

    Very nice artcile and helpfull, keep writting :)

  7. Shareena says:

    Thankyou so much. This is very helpfull.
    Can you please explain about redundancy in these examples.

  8. Ksenia says:

    Dan, thank you very much for such a great explanation! It is really hard to understand how to find CK in FD, but you made it clear! super!

  9. chiranjiv says:

    Thanks… It’s very helpful :-)

  10. Harshit Mehta says:

    Can you please give me a C code for this procedure!

  11. Duc Cuong says:

    Thank you very much for your detail article. Finally did I find something that is really useful and helps me understands clearly about how to determine CK and NF from FDs.

  12. prem says:

    can you help me in finding the normal for relation :
    R(a,b,c,d,e)
    Functional Dependencies : A->B
    BC->E
    ED->A

    • amit says:

      hi prem,

      Diagrammatic representation of the FDs is:
      A -> B |
      C |-> E |
      D | -> A

      How to find the CKs:
      ‘A’ uniquely cant determine all the attributes. It needs ‘C’ and ‘D’. Therefore CK is { A, C, D }
      Similarly, if we have ‘B’ and ‘C’, we need ‘D’ to determine all other attributes, so CK is { B, C, D }
      Finally if we have ‘E’ and ‘D’, we need ‘C’ to determine all other attributes, so CK is { C, E, D }
      So we have 3 CKs:
      1. { A, C, D}
      2. { B, C, D }
      3. {C, E , D}

      Non prime attributes are ones that are not part of any CK. Therefore they are ‘C’ and ‘D’

      Now to check all NFs:
      i) Any given relation by definition of a relation is ‘already’ in 1NF

      ii) To find out 2NF:
      Is there any composite Key? Ans: Yes
      Then does any part of our composite keys independently determine any of the non-prime attributes?
      Ans: ‘C’ and ‘D’ cannot be independently determined by any part of any CK
      So 2NF condition is passed

      iii. To find out 3NF:
      Does the relation have any non-primes? Ans: Yes
      Then, is there a non-prime that determines (point to) another non-prime?
      Ans: Neither ‘C’ determines (points to) to ‘D’ nor vice versa.
      So 3NF condition is passed

      iv. To find out BCNF
      In all given FDs, are all the determinants also candidate keys?
      No, neither ‘A’ nor ‘BC’ nor ‘ED’ are candidate keys. Therefore BCNF condition does not pass.

      Highest NF is 3NF

      • smita says:

        A->BC
        B->D
        CD->E
        E->A
        Its highest normal fom is 3NF or BCNF

        • Hareish says:

          CK:-A,BC,CD,E

          All attributes in a relation is a key or part of key..then blindly say it is in 3NF.

          it’s time for check BCNF,
          in BCNF every determinent is CK then check one by one.
          A determinent is CK.
          B determinent is not a CK is a part of a key.

          conclusion:-highest NF is 3NF

      • Shaurya says:

        Dear Sir

        You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.
        Thanking You

      • Shaurya says:

        Hey

        You said that non-primes do not exist in the CK’s but both “C” and “D” do exist in the CK’s … sir could you please explain this querry to me.
        Thanking You

      • Hi Amit,

        Thanks for such a deep explanation but i am confuse about non prime attribute as non prime attribute should not should not be part of any candidate key but c and d are part of candidate key so they are non prime.

  13. Marvin P says:

    Super….really helpful…thanks a lot

  14. snopcoff says:

    It’s really helpful, thank you very much.

  15. Diamond says:

    What if you have a very complex set of FDs which you cannot put in diagram form? What do you do then? Example if you have up to 8 FDs and only one of that 8 has one attribute on the LHS. All others have 2 or 3 attributes. Its going to be really complex drawing the box diagram for such a situation. So what do you do?

    • Dan says:

      If the drawings become too complex, then I think you need to really dig in to the theory behind functional dependencies. There is probably some deep mathematical thinking behind it that will help you to a solution, but I’m not the right guy to ask.

  16. Pravin says:

    Really it’s helpful

  17. Deepak gautam says:

    great and very helpful …. thanks for this ..

  18. Manasi says:

    It’s very helpful.
    Thanks a lot !!

  19. Hareish says:

    my searching ends here…..thanks a lot!!
    if D->E in a relation R then E->D also possible?

  20. Harish says:

    F = {AB→C, C→B, BC→D}
    Find highest normal form?
    i think it is in 3NF

  21. Jay Sadhwani says:

    Thanks for this article DAN…… Keep on posting

  22. suhaib says:

    what if in a f.d,collection of prime and non prime attribute determines non prime attribute.Which normal form is that?

  23. rashmi says:

    firstly i would like to thank this blog and Dan sir for a simpler explanation to identify CNs.
    i have a dought in ur example 1:, if a->b and b->{a,c} then according to formal definition of 3NF a->b,b->c so a->c. hence this relation should not be in 3NF . please help me out from this confusion. hoping a solution from this blog. thanking you in advance.

    • Vaughan says:

      The definition strictly says that no non-prime attribute should determine another non-prime attribute.

      In this case, we can see that is true.

  24. arpit says:

    gr8 work buddy (Y)

  25. Vaughan says:

    I think we have a bit of an error here. In Example 3, we list two different candidate keys. However, they are of different sizes. The candidate key should be the minimally sized one.

    Is the candidate key then just “A”?

  26. Romit Nagariya says:

    Thank you very much! Your post helped me to get through the exams. :)

  27. Anonymous says:

    R(a,b,c,d,e)
    Functional Dependencies : A->B
    BC->E
    ED->A

    How to find a non-prime

  28. Ilja I says:

    Hi, thanks for the article
    but why in the second example {A,C,F} is a candidate key ?
    can D be determined by {A,C,F} ?

  29. SN says:

    Firstly Thanks a lot for the really helpful article.
    I’ve got 2 questions though:
    1) In example 2, why is {ACF} a candidate key? A leads to B and C leads to E (and F is included in the Key), but between the 3 of them none independently leads to D and neither do any of them together. The only path to D was through {B,C} collectively, so I’m assuming C alone cant lead to D. Then either:
    (i) {B,C,F} is the sole candidate and {ACF} is not valid as they cant lead to D or
    (ii) The {ACF} CK should include B or D to solve the issue: as either {A,B,C,F} or {A,C,D,F}

    Which leads to my second question:
    2) When talking about minimal CKs, you only mentioned situations where ex: CKs are: A and {B,C} then {A,B,C} is not a CK since you can drop either half and the CK still holds true.
    But what about when {B,C,F} is a CK and A is not. Can {A,B,C,F} be a CK or is it not valid since you can drop A and still have a valid CK even though vice versa isnt true?

    [I really hope my questions make sense to you]
    Thanks again!

  30. Marvin P says:

    Very Nice article…….:)

  31. Shubham says:

    Its really help full article..

  32. arun pant says:

    superb…. remove my all confusion

  33. Avinash Kumar says:

    The following set of functional dependencies in a relation R(A,B,C,D,E,F) IS:
    A->B,
    A->C
    D->E
    AD->F
    The relation is 1NF,2NF,3NF or BCNF?

  34. Pawan says:

    it’s really a helpful article

  35. faisal says:

    Is it necessary any given set of functional dependencies is in 1NF?

  36. aastha says:

    R(A,B,C,D,E)
    AB->DE;C->E;D->C;E->A
    find out the canditate keys and is this is 3nf and bcnf?

  37. Gintaras says:

    This is the clearest, most amazing guide to NF’s I’ve come across. Well done, marvelous! :)

  38. ritesh pandita says:

    hello
    can you please draw the diagram for: AB->CD , ABC->E,C->E
    AND please explain the candidate keys

  39. Clara says:

    That was a very nice article.It definitely helped me to understand how to determine CK.

  40. Clara says:

    1)How can you make a relation to be in 2NF or 3NF when previously it was not in those forms.2)How to determine primary key and all foreign keys based on the functional dependencies.It will be better if you reply me with examples.

  41. john says:

    R(ABCDEF)
    F={AB—D, F—C, E—AF }
    What are the candidate key of this relation?
    What is the highest normal form of this relation?

  42. shiva says:

    Why A is added in the ck since only BC could have determined the relation…

  43. Jo says:

    1NF is not automatic. It is not in 1NF if there are multivalued attributes in the table…

  44. Ayan Gupta says:

    Very helpful for me, to determine normal Form by any given relationship & related dependencies.
    Thanks alot. It really means a lot.

  45. Grateful Undead says:

    You saved my life, I am eternally grateful.

  46. Shyam says:

    Very helpful thanks

Leave a Reply

Your email address will not be published. Required fields are marked *