public
abstract
class
Collator
extends Object
implements
Comparator<Object>,
Freezable<Collator>,
Cloneable
java.lang.Object | |
↳ | android.icu.text.Collator |
Known Direct Subclasses |
[icu enhancement] ICU's replacement for Collator
. Methods, fields, and other functionality specific to ICU are labeled '[icu]'.
Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.
A Collator is thread-safe only when frozen. See isFrozen()
and Freezable
.
Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:
For more information about the collation service see the User Guide.
Examples of use
// Get the Collator for US English and set its strength to PRIMARY Collator usCollator = Collator.getInstance(Locale.US); usCollator.setStrength(Collator.PRIMARY); if (usCollator.compare("abc", "ABC") == 0) { System.out.println("Strings are equivalent"); } The following example shows how to compare two strings using the Collator for the default locale. // Compare two strings in the default locale Collator myCollator = Collator.getInstance(); myCollator.setDecomposition(NO_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition"); myCollator.setDecomposition(CANONICAL_DECOMPOSITION); if (myCollator.compare("à\u0325", "a\u0325̀") != 0) { System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition"); } else { System.out.println("à\u0325 is equals to a\u0325̀ with decomposition"); } } else { System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition"); }
See also:
Nested classes | |
---|---|
interface |
Collator.ReorderCodes
Reordering codes for non-script groups that can be reordered under collation. |
Constants | |
---|---|
int |
CANONICAL_DECOMPOSITION
Decomposition mode value. |
int |
FULL_DECOMPOSITION
[icu] Note: This is for backwards compatibility with Java APIs only. |
int |
IDENTICAL
Smallest Collator strength value. |
int |
NO_DECOMPOSITION
Decomposition mode value. |
int |
PRIMARY
Strongest collator strength value. |
int |
QUATERNARY
[icu] Fourth level collator strength value. |
int |
SECONDARY
Second level collator strength value. |
int |
TERTIARY
Third level collator strength value. |
Protected constructors | |
---|---|
Collator()
Empty default constructor to make javadocs happy |
Public methods | |
---|---|
Object
|
clone()
Clones the collator. |
Collator
|
cloneAsThawed()
Provides for the clone operation. |
int
|
compare(Object source, Object target)
Compares the source Object to the target Object. |
abstract
int
|
compare(String source, String target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. |
boolean
|
equals(String source, String target)
Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode. |
boolean
|
equals(Object obj)
Compares the equality of two Collator objects. |
Collator
|
freeze()
Freezes the collator. |
static
Locale[]
|
getAvailableLocales()
Returns the set of locales, as Locale objects, for which collators are installed. |
static
final
ULocale[]
|
getAvailableULocales()
[icu] Returns the set of locales, as ULocale objects, for which collators are installed. |
abstract
CollationKey
|
getCollationKey(String source)
Transforms the String into a CollationKey suitable for efficient repeated comparison. |
int
|
getDecomposition()
Returns the decomposition mode of this Collator. |
static
String
|
getDisplayName(Locale objectLocale, Locale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale. |
static
String
|
getDisplayName(ULocale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default |
static
String
|
getDisplayName(ULocale objectLocale, ULocale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale. |
static
String
|
getDisplayName(Locale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default |
static
int[]
|
getEquivalentReorderCodes(int reorderCode)
Retrieves all the reorder codes that are grouped with the given reorder code. |
static
final
ULocale
|
getFunctionalEquivalent(String keyword, ULocale locID)
[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. |
static
final
ULocale
|
getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)
[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. |
static
final
Collator
|
getInstance()
Returns the Collator for the current default locale. |
static
final
Collator
|
getInstance(Locale locale)
Returns the Collator for the desired locale. |
static
final
Collator
|
getInstance(ULocale locale)
[icu] Returns the Collator for the desired locale. |
static
final
String[]
|
getKeywordValues(String keyword)
[icu] Given a keyword, returns an array of all values for that keyword that are currently in use. |
static
final
String[]
|
getKeywordValuesForLocale(String key, ULocale locale, boolean commonlyUsed)
[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference. |
static
final
String[]
|
getKeywords()
[icu] Returns an array of all possible keywords that are relevant to collation. |
int
|
getMaxVariable()
[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior. |
int[]
|
getReorderCodes()
Retrieves the reordering codes for this collator. |
int
|
getStrength()
Returns this Collator's strength attribute. |
UnicodeSet
|
getTailoredSet()
[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator. |
abstract
VersionInfo
|
getUCAVersion()
[icu] Returns the UCA version of this collator object. |
abstract
int
|
getVariableTop()
[icu] Gets the variable top value of a Collator. |
abstract
VersionInfo
|
getVersion()
[icu] Returns the version of this collator object. |
boolean
|
isFrozen()
Determines whether the object has been frozen or not. |
void
|
setDecomposition(int decomposition)
Sets the decomposition mode of this Collator. |
Collator
|
setMaxVariable(int group)
[icu] Sets the variable top to the top of the specified reordering group. |
void
|
setReorderCodes(int... order)
Sets the reordering codes for this collator. |
void
|
setStrength(int newStrength)
Sets this Collator's strength attribute. |
Inherited methods | |
---|---|
From
class
java.lang.Object
| |
From
interface
java.util.Comparator
| |
From
interface
android.icu.util.Freezable
|
int CANONICAL_DECOMPOSITION
Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.
CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.
Constant Value: 17 (0x00000011)
int FULL_DECOMPOSITION
[icu] Note: This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.
Constant Value: 15 (0x0000000f)
int IDENTICAL
Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.
Note this value is different from JDK's
Constant Value: 15 (0x0000000f)
int NO_DECOMPOSITION
Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.
Note this value is different from the JDK's.
Constant Value: 16 (0x00000010)
int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.
See also:
Constant Value: 0 (0x00000000)
int QUATERNARY
[icu] Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuation in the User Guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.
See also:
Constant Value: 3 (0x00000003)
int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.
See also:
Constant Value: 1 (0x00000001)
int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.
See also:
Constant Value: 2 (0x00000002)
Object clone ()
Clones the collator.
Returns | |
---|---|
Object |
a clone of this collator. |
Throws | |
---|---|
CloneNotSupportedException |
Collator cloneAsThawed ()
Provides for the clone operation. Any clone is initially unfrozen.
Returns | |
---|---|
Collator |
int compare (Object source, Object target)
Compares the source Object to the target Object.
Parameters | |
---|---|
source |
Object :
the source Object. |
target |
Object :
the target Object. |
Returns | |
---|---|
int |
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target. |
Throws | |
---|---|
ClassCastException |
thrown if either arguments cannot be cast to CharSequence. |
int compare (String source, String target)
Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.
Parameters | |
---|---|
source |
String :
the source String. |
target |
String :
the target String. |
Returns | |
---|---|
int |
Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target. |
Throws | |
---|---|
NullPointerException |
thrown if either argument is null. |
See also:
boolean equals (String source, String target)
Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode. Convenience method.
Parameters | |
---|---|
source |
String :
the source string to be compared. |
target |
String :
the target string to be compared. |
Returns | |
---|---|
boolean |
true if the strings are equal according to the collation rules, otherwise false. |
Throws | |
---|---|
NullPointerException |
thrown if either arguments is null. |
See also:
boolean equals (Object obj)
Compares the equality of two Collator objects. Collator objects are equal if they have the same collation (sorting & searching) behavior.
The base class checks for null and for equal types. Subclasses should override.
Parameters | |
---|---|
obj |
Object :
the Collator to compare to. |
Returns | |
---|---|
boolean |
true if this Collator has exactly the same collation behavior as obj, false otherwise. |
Collator freeze ()
Freezes the collator.
Returns | |
---|---|
Collator |
the collator itself. |
Locale[] getAvailableLocales ()
Returns the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.
Returns | |
---|---|
Locale[] |
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J. |
ULocale[] getAvailableULocales ()
[icu] Returns the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.
Returns | |
---|---|
ULocale[] |
the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J. |
CollationKey getCollationKey (String source)
Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.
Note that collation keys are often less efficient than simply doing comparison. For more details, see the ICU User Guide.
See the CollationKey class documentation for more information.
Parameters | |
---|---|
source |
String :
the string to be transformed into a CollationKey. |
Returns | |
---|---|
CollationKey |
the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned. |
int getDecomposition ()
Returns the decomposition mode of this Collator. The decomposition mode determines how Unicode composed characters are handled.
See the Collator class description for more details.
The base class method always returns NO_DECOMPOSITION
.
Subclasses should override it if appropriate.
Returns | |
---|---|
int |
the decomposition mode |
String getDisplayName (Locale objectLocale, Locale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.
Parameters | |
---|---|
objectLocale |
Locale :
the locale of the collator |
displayLocale |
Locale :
the locale for the collator's display name |
Returns | |
---|---|
String |
the display name |
String getDisplayName (ULocale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default DISPLAY
locale.
Parameters | |
---|---|
objectLocale |
ULocale :
the locale of the collator |
Returns | |
---|---|
String |
the display name |
See also:
String getDisplayName (ULocale objectLocale, ULocale displayLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.
Parameters | |
---|---|
objectLocale |
ULocale :
the locale of the collator |
displayLocale |
ULocale :
the locale for the collator's display name |
Returns | |
---|---|
String |
the display name |
String getDisplayName (Locale objectLocale)
[icu] Returns the name of the collator for the objectLocale, localized for the
default DISPLAY
locale.
Parameters | |
---|---|
objectLocale |
Locale :
the locale of the collator |
Returns | |
---|---|
String |
the display name |
See also:
int[] getEquivalentReorderCodes (int reorderCode)
Retrieves all the reorder codes that are grouped with the given reorder code. Some reorder codes are grouped and must reorder together. Beginning with ICU 55, scripts only reorder together if they are primary-equal, for example Hiragana and Katakana.
Parameters | |
---|---|
reorderCode |
int :
The reorder code to determine equivalence for. |
Returns | |
---|---|
int[] |
the set of all reorder codes in the same group as the given reorder code. |
ULocale getFunctionalEquivalent (String keyword, ULocale locID)
[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.
Parameters | |
---|---|
keyword |
String :
a particular keyword as enumerated by
getKeywords. |
locID |
ULocale :
The requested locale |
Returns | |
---|---|
ULocale |
the locale |
ULocale getFunctionalEquivalent (String keyword, ULocale locID, boolean[] isAvailable)
[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.
Parameters | |
---|---|
keyword |
String :
a particular keyword as enumerated by
getKeywords. |
locID |
ULocale :
The requested locale |
isAvailable |
boolean :
If non-null, isAvailable[0] will receive and
output boolean that indicates whether the requested locale was
'available' to the collation service. If non-null, isAvailable
must have length >= 1. |
Returns | |
---|---|
ULocale |
the locale |
Collator getInstance ()
Returns the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().
Returns | |
---|---|
Collator |
the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned. |
See also:
Collator getInstance (Locale locale)
Returns the Collator for the desired locale.
For some languages, multiple collation types are available;
for example, "de-u-co-phonebk".
Starting with ICU 54, collation attributes can be specified via locale keywords as well,
in the old locale extension syntax ("el@colCaseFirst=upper", only with ULocale
)
or in language tag syntax ("el-u-kf-upper").
See User Guide: Collation API.
Parameters | |
---|---|
locale |
Locale :
the desired locale. |
Returns | |
---|---|
Collator |
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned. |
Collator getInstance (ULocale locale)
[icu] Returns the Collator for the desired locale.
For some languages, multiple collation types are available; for example, "de@collation=phonebook". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.
Parameters | |
---|---|
locale |
ULocale :
the desired locale. |
Returns | |
---|---|
Collator |
Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned. |
String[] getKeywordValues (String keyword)
[icu] Given a keyword, returns an array of all values for that keyword that are currently in use.
Parameters | |
---|---|
keyword |
String :
one of the keywords returned by getKeywords. |
Returns | |
---|---|
String[] |
See also:
String[] getKeywordValuesForLocale (String key, ULocale locale, boolean commonlyUsed)
[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference. These are all and only those values where the open (creation) of the service with the locale formed from the input locale plus input keyword and that value has different behavior than creation with the input locale alone.
Parameters | |
---|---|
key |
String :
one of the keys supported by this service. For now, only
"collation" is supported. |
locale |
ULocale :
the locale |
commonlyUsed |
boolean :
if set to true it will return only commonly used values
with the given locale in preferred order. Otherwise,
it will return all the available values for the locale. |
Returns | |
---|---|
String[] |
an array of string values for the given key and the locale. |
String[] getKeywords ()
[icu] Returns an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".
Returns | |
---|---|
String[] |
an array of valid collation keywords. |
See also:
int getMaxVariable ()
[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.
The base class implementation returns Collator.ReorderCodes.PUNCTUATION.
Returns | |
---|---|
int |
the maximum variable reordering group. |
See also:
int[] getReorderCodes ()
Retrieves the reordering codes for this collator. These reordering codes are a combination of UScript codes and ReorderCodes.
Returns | |
---|---|
int[] |
a copy of the reordering codes for this collator; if none are set then returns an empty array |
int getStrength ()
Returns this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant. [icu] Note: This can return QUATERNARY strength, which is not supported by the JDK version.
See the Collator class description for more details.
The base class method always returns TERTIARY
.
Subclasses should override it if appropriate.
Returns | |
---|---|
int |
this Collator's current strength attribute. |
UnicodeSet getTailoredSet ()
[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator.
Returns | |
---|---|
UnicodeSet |
a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the root collator. |
VersionInfo getUCAVersion ()
[icu] Returns the UCA version of this collator object.
Returns | |
---|---|
VersionInfo |
the version object associated with this collator |
int getVariableTop ()
[icu] Gets the variable top value of a Collator.
Returns | |
---|---|
int |
the variable top primary weight |
See also:
VersionInfo getVersion ()
[icu] Returns the version of this collator object.
Returns | |
---|---|
VersionInfo |
the version object associated with this collator |
boolean isFrozen ()
Determines whether the object has been frozen or not.
An unfrozen Collator is mutable and not thread-safe. A frozen Collator is immutable and thread-safe.
Returns | |
---|---|
boolean |
void setDecomposition (int decomposition)
Sets the decomposition mode of this Collator. Setting this decomposition attribute with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.
Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.
The base class method does nothing. Subclasses should override it if appropriate.
See getDecomposition for a description of decomposition mode.
Parameters | |
---|---|
decomposition |
int :
the new decomposition mode |
Throws | |
---|---|
IllegalArgumentException |
If the given value is not a valid decomposition mode. |
Collator setMaxVariable (int group)
[icu] Sets the variable top to the top of the specified reordering group. The variable top determines the highest-sorting character which is affected by the alternate handling behavior. If that attribute is set to UCOL_NON_IGNORABLE, then the variable top has no effect.
The base class implementation throws an UnsupportedOperationException.
Parameters | |
---|---|
group |
int :
one of Collator.ReorderCodes.SPACE, Collator.ReorderCodes.PUNCTUATION,
Collator.ReorderCodes.SYMBOL, Collator.ReorderCodes.CURRENCY;
or Collator.ReorderCodes.DEFAULT to restore the default max variable group |
Returns | |
---|---|
Collator |
this |
See also:
void setReorderCodes (int... order)
Sets the reordering codes for this collator.
Collation reordering allows scripts and some other groups of characters
to be moved relative to each other. This reordering is done on top of
the DUCET/CLDR standard collation order. Reordering can specify groups to be placed
at the start and/or the end of the collation order. These groups are specified using
UScript codes and Collator.ReorderCodes
entries.
By default, reordering codes specified for the start of the order are placed in the
order given after several special non-script blocks. These special groups of characters
are space, punctuation, symbol, currency, and digit. These special groups are represented with
Collator.ReorderCodes
entries. Script groups can be intermingled with
these special non-script groups if those special groups are explicitly specified in the reordering.
The special code OTHERS
stands for any script that is not explicitly
mentioned in the list of reordering codes given. Anything that is after OTHERS
will go at the very end of the reordering in the order given.
The special reorder code DEFAULT
will reset the reordering for this collator
to the default for this collator. The default reordering may be the DUCET/CLDR order or may be a reordering that
was specified when this collator was created from resource data or from rules. The
DEFAULT code must be the sole code supplied when it is used.
If not, then an IllegalArgumentException
will be thrown.
The special reorder code NONE
will remove any reordering for this collator.
The result of setting no reordering will be to have the DUCET/CLDR ordering used. The
NONE code must be the sole code supplied when it is used.
Parameters | |
---|---|
order |
int :
the reordering codes to apply to this collator; if this is null or an empty array
then this clears any existing reordering |
void setStrength (int newStrength)
Sets this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant during comparison.
The base class method does nothing. Subclasses should override it if appropriate.
See the Collator class description for an example of use.
Parameters | |
---|---|
newStrength |
int :
the new strength value. |
Throws | |
---|---|
IllegalArgumentException |
if the new strength value is not valid. |