[Solved] Vigenere Cipher c# with “ñ”


Your code:

if (Char.IsLetter(s[i]))
{
        s[i] = (char)(s[i] + key[j] - 'A');
        if (s[i] > 'Z') s[i] = (char)(s[i] - 'Z' + 'A' - 1);
}

Depends on the fact that the letters from U+0041 to U+005A just happen to match a the letters of the alphabets of some languages, such as English*. (If the test had depended on this instead of just checking it was a letter then you would have been leaving Ñ unchanged rather than get an error). There are some other languages whose alphabet’s are contiguous and in order in the UCS, but most languages are not.

For this reason you’ll need to define your own alphabet. A string is a simple enough way to do this for most uses.

string spanishAlphabet = "ABCDEFGHIJKLMNÑOPQRSTUVWXYZ";
string englishAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
string irishAlphabet = "ABCDEFGHILMNOPRSTU";
string danishAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZÆØÅ";
string norwegianAlphabet = danishAlphabet;

Then instead of depending on coincidences between an alphabet and the UCS, you can use the alphabet you care about:

static void VigenereEncrypt(StringBuilder s, string key, string alphabet)
{
  for (int i = 0; i < s.Length; i++) s[i] = Char.ToUpper(s[i]);
  key = key.ToUpper();
  int j = 0;
  for (int i = 0; i < s.Length; i++)
  {
    if(alphabet.Contains(s[i]))
      s[i] = alphabet[(alphabet.IndexOf(s[i]) + alphabet.IndexOf(key[j])) % alphabet.Length];
    j = (j + 1) % key.Length;
  }
}

static void VigenereDecrypt(StringBuilder s, string key, string alphabet)
{
  for (int i = 0; i < s.Length; i++) s[i] = Char.ToUpper(s[i]);
  key = key.ToUpper();
  int j = 0;
  for (int i = 0; i < s.Length; i++)
  {
    if(alphabet.Contains(s[i]))
    {
      s[i] = alphabet[(alphabet.IndexOf(s[i]) - alphabet.IndexOf(key[j]) + alphabet.Length) % alphabet.Length];
      j = (j + 1) % key.Length;
    }
  }
}

(I’m assuming that the key is always composed solely from the alphabet in question, a more robust solution wouldn’t make that assumption, but there are a few different approaches as to just what one should do in such a case, so there isn’t a single correct way to deal with that and I ignored the issue).

I also took out the ref keyword, since the StringBuilder isn’t changed for another reference as that signature suggests, but mutated in-place. A more idiomatic approach though would be to receive a string and return another:

static string VigenereEncrypt(string s, string key, string alphabet)
{
  s = s.ToUpper();
  key = key.ToUpper();
  int j = 0;
  StringBuilder ret = new StringBuilder(s.Length);
  for (int i = 0; i < s.Length; i++)
  {
    if(alphabet.Contains(s[i]))
      ret.Append(alphabet[(alphabet.IndexOf(s[i]) + alphabet.IndexOf(key[j])) % alphabet.Length]);
    else
      ret.Append(s[i]);
    j = (j + 1) % key.Length;
  }
  return ret.ToString();
}

static string VigenereDecrypt(string s, string key, string alphabet)
{
  s = s.ToUpper();
  key = key.ToUpper();
  int j = 0;
  StringBuilder ret = new StringBuilder(s.Length);
  for (int i = 0; i < s.Length; i++)
  {
    if(alphabet.Contains(s[i]))
      ret.Append(alphabet[(alphabet.IndexOf(s[i]) - alphabet.IndexOf(key[j]) + alphabet.Length) % alphabet.Length]);
    else
      ret.Append(s[i]);
    j = (j + 1) % key.Length;
  }
  return ret.ToString();
}

If you want to treat strings that Unicode does not consider a single character as a letter, e.g. IJ in Dutch† this gets more complicated. One possibility is to use a marker character for such a sequence and then first replace each case of the sequence with it before encrypting‡, and then replace back again should the marker appear in the output. One would have to be sure that the marker character didn’t appear in the input, which would make non-characters like U+FFFE useful here.

Diacritics that are not considered separate parts of the alphabet (like Ñ is in Spanish), are another complication. In the days when cyphers like the Vigenère were actually used it was common to just strip diacritics and deal with the fact that the output would not have diacritics it should have. An easy way to do that is to use a method like:

public static IEnumerable<char> RemoveDiacriticsEnum(string src, string alphabet)
{
  foreach(char c in src.Normalize(NormalizationForm.FormD))
    if(alphabet.Contains(c))  // Catch e.g. Ñ in Spanish, considered letter in own right
      yield return c;
    else
      switch(CharUnicodeInfo.GetUnicodeCategory(c))
      {
        case UnicodeCategory.NonSpacingMark:
        case UnicodeCategory.SpacingCombiningMark:
        case UnicodeCategory.EnclosingMark:
          //do nothing
          break;
        default:
          yield return customFolding(c);
          break;
      }
}

And then use a loop that does foreach(char c in RemoveDiacriticsEnum(s, alphabet)) and uses c where the code above uses s[i]. This won’t cover all cases, see https://stackoverflow.com/a/3769995/400547 for some of the possible complications.

Alternatively, one could include the common accent combinations in the alphabet:

string spanishAlphabet = "AÁBCDEÉFGHIÍJKLMNÑOÓPQRSTUÚÜVWXYZ";

*Strictly speaking there are a variety of conventions about where some other characters, particularly Ð, Ȝ and Þ should be positioned if used, so one version of the modern English alphabet is A,B,C,D,[Ð],E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,[Ȝ],Z,[Þ], that is one would generally not list Ð, but if there was a word in your data that started with it, you’d position it between D and E, and so on. This is an obscure case in Modern English (we don’t really use those letters any more), but can be more significant in some other languages; e.g. the Irish alphabet is A,B,C,D,E,F,G,H,I,L,M,N,O,P,R,S,T,U but V is used in a few onomatopœic words, and J,K,Q,V,W,X,Y,Z are each found in some loan words, so we could list the Irish alphabet as A,B,C,D,E,F,G,H,I,[J],[K],L,M,N,O,P,[Q],R,S,T,U,[V],[W],[X],[Y],[Z], not generally listing the letters in brackets, but positioning e.g. J between I and L if a word beginning with J is in a set of data. This complicates the question of cyphers like Vigenère because we have to either use letters not strictly part of the alphabet in the calculation, or else not encrypt the V of a word like vótaí.

†While there is a IJ character in the UCS at U+0132, this is for compatibility with legacy encodings. Still using IJ as the marker-character for IJ would neatly handle both IJ and data that had used IJ.

Encrypting in a rather loose sense, since this encryption scheme was broken by the middle of the 19th Century.

0

solved Vigenere Cipher c# with “ñ”