[C#] 문자열이 Unicode인지 판단하기
제목은 Unicode인지 판단하기라고 썼지만,
사실 Unicode Category를 판단하는 것입니다.
C/C++ 프로그램을 해보셨다면
영문자를 판단하는 코드를 한번쯤은 짜보셨을 겁니다.
"ascii code 얼마부터 얼마까지 속하면 영문자이다." 라는 식이지요.
(혹은 이 영문자가 대문자인지 소문자인지를 판단하는 프로그램도 있지요)
C#으로 오면서 유니코드 기반이다 보니
이런 저런 편리한 class들이 많이 생겼습니다.
이와 관련하여
System.Globalization 네임스페이스에 있는 CharUnicodeInfo 이라는 클래스인데요.
CharUnicodeInfo.GetUnicodeCategory를 사용하시면 됩니다.
예를 들어

판단할 수 있는 내용은 아래와 같습니다.
http://msdn2.microsoft.com/en-us/librar ··· ory.aspx
덧붙여 이름만 들어도 반가울 함수들도 제공하고 있습니다.
GetDecimalDigitValue()
GetDigitValue()
GetNumericValue()
사실 Unicode Category를 판단하는 것입니다.
C/C++ 프로그램을 해보셨다면
영문자를 판단하는 코드를 한번쯤은 짜보셨을 겁니다.
"ascii code 얼마부터 얼마까지 속하면 영문자이다." 라는 식이지요.
(혹은 이 영문자가 대문자인지 소문자인지를 판단하는 프로그램도 있지요)
C#으로 오면서 유니코드 기반이다 보니
이런 저런 편리한 class들이 많이 생겼습니다.
이와 관련하여
System.Globalization 네임스페이스에 있는 CharUnicodeInfo 이라는 클래스인데요.
CharUnicodeInfo.GetUnicodeCategory를 사용하시면 됩니다.
예를 들어
using System;
using System.Collections.Generic;
using System.Globalization;
using System.Collections;
namespace WindowsApplication3
{
static class Program
{
static void Main()
{
string testKorean = "abcd한글";
string testDigit = "abc123";
string testSymbol ="\\*&";
string testSpace = "abc def";
PrintCategory(testKorean, GetCodeType(testKorean));
PrintCategory(testDigit, GetCodeType(testDigit));
PrintCategory(testSymbol, GetCodeType(testSymbol));
PrintCategory(testSpace, GetCodeType(testSpace));
}
static UnicodeCategory[] GetCodeType(string str)
{
if (string.IsNullOrEmpty(str))
{
return null;
}
else
{
List<UnicodeCategory> m_listUnicodeInfo = new List<UnicodeCategory>();
for(int i=0; i<str.Length; i++)
{
UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(str[i]);
if(m_listUnicodeInfo.Contains(category) == false)
{
m_listUnicodeInfo.Add(category);
}
}
return m_listUnicodeInfo.ToArray();
}
}
static void PrintCategory(string str, UnicodeCategory[] categories)
{
System.Console.WriteLine(str + " : ");
if(categories != null)
{
for(int i=0; i<categories.Length; i++)
{
System.Console.Write(categories[i].ToString() + ", ");
}
System.Console.WriteLine();
}
}
}
}
위의 코드를 실행시키면 아래와 같이 나옵니다.using System.Collections.Generic;
using System.Globalization;
using System.Collections;
namespace WindowsApplication3
{
static class Program
{
static void Main()
{
string testKorean = "abcd한글";
string testDigit = "abc123";
string testSymbol ="\\*&";
string testSpace = "abc def";
PrintCategory(testKorean, GetCodeType(testKorean));
PrintCategory(testDigit, GetCodeType(testDigit));
PrintCategory(testSymbol, GetCodeType(testSymbol));
PrintCategory(testSpace, GetCodeType(testSpace));
}
static UnicodeCategory[] GetCodeType(string str)
{
if (string.IsNullOrEmpty(str))
{
return null;
}
else
{
List<UnicodeCategory> m_listUnicodeInfo = new List<UnicodeCategory>();
for(int i=0; i<str.Length; i++)
{
UnicodeCategory category = CharUnicodeInfo.GetUnicodeCategory(str[i]);
if(m_listUnicodeInfo.Contains(category) == false)
{
m_listUnicodeInfo.Add(category);
}
}
return m_listUnicodeInfo.ToArray();
}
}
static void PrintCategory(string str, UnicodeCategory[] categories)
{
System.Console.WriteLine(str + " : ");
if(categories != null)
{
for(int i=0; i<categories.Length; i++)
{
System.Console.Write(categories[i].ToString() + ", ");
}
System.Console.WriteLine();
}
}
}
}

판단할 수 있는 내용은 아래와 같습니다.
http://msdn2.microsoft.com/en-us/librar ··· ory.aspx
| Member name | Description | |
|---|---|---|
![]() |
ClosePunctuation | Indicates that the character is the closing character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. Signified by the Unicode designation "Pe" (punctuation, close). The value is 21. |
![]() |
ConnectorPunctuation | Indicates that the character is a connector punctuation, which connects two characters. Signified by the Unicode designation "Pc" (punctuation, connector). The value is 18. |
![]() |
Control | Indicates that the character is a control code, whose Unicode value is U+007F or in the range U+0000 through U+001F or U+0080 through U+009F. Signified by the Unicode designation "Cc" (other, control). The value is 14. |
![]() |
CurrencySymbol | Indicates that the character is a currency symbol. Signified by the Unicode designation "Sc" (symbol, currency). The value is 26. |
![]() |
DashPunctuation | Indicates that the character is a dash or a hyphen. Signified by the Unicode designation "Pd" (punctuation, dash). The value is 19. |
![]() |
DecimalDigitNumber | Indicates that the character is a decimal digit; that is, in the range 0 through 9. Signified by the Unicode designation "Nd" (number, decimal digit). The value is 8. |
![]() |
EnclosingMark | Indicates that the character is an enclosing mark, which is a nonspacing combining character that surrounds all previous characters up to and including a base character. Signified by the Unicode designation "Me" (mark, enclosing). The value is 7. |
![]() |
FinalQuotePunctuation | Indicates that the character is a closing or final quotation mark. Signified by the Unicode designation "Pf" (punctuation, final quote). The value is 23. |
![]() |
Format | Indicates that the character is a format character, which is not normally rendered but affects the layout of text or the operation of text processes. Signified by the Unicode designation "Cf" (other, format). The value is 15. |
![]() |
InitialQuotePunctuation | Indicates that the character is an opening or initial quotation mark. Signified by the Unicode designation "Pi" (punctuation, initial quote). The value is 22. |
![]() |
LetterNumber | Indicates that the character is a number represented by a letter, instead of a decimal digit; for example, the Roman numeral for five, which is 'V'. Signified by the Unicode designation "Nl" (number, letter). The value is 9. |
![]() |
LineSeparator | Indicates that the character is used to separate lines of text. Signified by the Unicode designation "Zl" (separator, line). The value is 12. |
![]() |
LowercaseLetter | Indicates that the character is a lowercase letter. Signified by the Unicode designation "Ll" (letter, lowercase). The value is 1. |
![]() |
MathSymbol | Indicates that the character is a mathematical symbol, such as '+' or '= '. Signified by the Unicode designation "Sm" (symbol, math). The value is 25. |
![]() |
ModifierLetter | Indicates that the character is a modifier letter, which is free-standing spacing character that indicates modifications of a preceding letter. Signified by the Unicode designation "Lm" (letter, modifier). The value is 3. |
![]() |
ModifierSymbol | Indicates that the character is a modifier symbol, which indicates modifications of surrounding characters; for example, the fraction slash indicates that the number to the left is the numerator and the number to the right is the denominator. Signified by the Unicode designation "Sk" (symbol, modifier). The value is 27. |
![]() |
NonSpacingMark | Indicates that the character is a nonspacing character, which indicates modifications of a base character. Signified by the Unicode designation "Mn" (mark, nonspacing). The value is 5. |
![]() |
OpenPunctuation | Indicates that the character is the opening character of one of the paired punctuation marks, such as parentheses, square brackets, and braces. Signified by the Unicode designation "Ps" (punctuation, open). The value is 20. |
![]() |
OtherLetter | Indicates that the character is a letter that is not an uppercase letter, a lowercase letter, a titlecase letter, or a modifier letter. Signified by the Unicode designation "Lo" (letter, other). The value is 4. |
![]() |
OtherNotAssigned | Indicates that the character is not assigned to any Unicode category. Signified by the Unicode designation "Cn" (other, not assigned). The value is 29. |
![]() |
OtherNumber | Indicates that the character is a number that is neither a decimal digit nor a letter number; for example, the fraction 1/2. Signified by the Unicode designation "No" (number, other). The value is 10. |
![]() |
OtherPunctuation | Indicates that the character is a punctuation that is not a connector punctuation, a dash punctuation, an open punctuation, a close punctuation, an initial quote punctuation, or a final quote punctuation. Signified by the Unicode designation "Po" (punctuation, other). The value is 24. |
![]() |
OtherSymbol | Indicates that the character is a symbol that is not a mathematical symbol, a currency symbol or a modifier symbol. Signified by the Unicode designation "So" (symbol, other). The value is 28. |
![]() |
ParagraphSeparator | Indicates that the character is used to separate paragraphs. Signified by the Unicode designation "Zp" (separator, paragraph). The value is 13. |
![]() |
PrivateUse | Indicates that the character is a private-use character, whose Unicode value is in the range U+E000 through U+F8FF. Signified by the Unicode designation "Co" (other, private use). The value is 17. |
![]() |
SpaceSeparator | Indicates that the character is a space character, which has no glyph but is not a control or format character. Signified by the Unicode designation "Zs" (separator, space). The value is 11. |
![]() |
SpacingCombiningMark | Indicates that the character is a spacing character, which indicates modifications of a base character and affects the width of the glyph for that base character. Signified by the Unicode designation "Mc" (mark, spacing combining). The value is 6. |
![]() |
Surrogate | Indicates that the character is a high-surrogate or a low-surrogate. Surrogate code values are in the range U+D800 through U+DFFF. Signified by the Unicode designation "Cs" (other, surrogate). The value is 16. |
![]() |
TitlecaseLetter | Indicates that the character is a titlecase letter. Signified by the Unicode designation "Lt" (letter, titlecase). The value is 2. |
![]() |
UppercaseLetter | Indicates that the character is an uppercase letter. Signified by the Unicode designation "Lu" (letter, uppercase). The value is 0. |
덧붙여 이름만 들어도 반가울 함수들도 제공하고 있습니다.
GetDecimalDigitValue()
GetDigitValue()
GetNumericValue()
"프로그래밍 / TIP& Study" 분류의 다른 글
| [퀴즈] SQLite collate 문제 (0) | 2010/07/16 |
| [TIP] XP, Vista에서 CD/DVD롬이 보이지 않을 경우 (2) | 2009/08/05 |
| [C#] WinForm 에서 Docking 순서 변경하기 (0) | 2009/07/29 |
| [WPF] InkCanvas 사용하기 (2) | 2009/06/29 |
| [C#] Control Library 만들 때 TIP (0) | 2009/06/26 |
| [C#] 자연스럽게 Pen으로 그리기 (0) | 2009/06/23 |
| [WPF] Canvas의 Width, Height Binding (0) | 2009/06/23 |
| [C#] DateTime으로 7일 후는 어떻게? (0) | 2009/06/23 |
프로그래밍/TIP& Study
2007/07/13 19:06
.gif)

댓글을 달아 주세요