Most visited

Recently visited

Added in API level 24

Collator

public abstract class Collator
extends Object implements Comparator<Object>, Freezable<Collator>, Cloneable

java.lang.Object
   ↳ android.icu.text.Collator
Known Direct Subclasses


[icu enhancement] ICU's replacement for Collator. Methods, fields, and other functionality specific to ICU are labeled '[icu]'.

Collator performs locale-sensitive string comparison. A concrete subclass, RuleBasedCollator, allows customization of the collation ordering by the use of rule sets.

A Collator is thread-safe only when frozen. See isFrozen() and Freezable.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 5 different levels of strength used in comparisons:

Unlike the JDK, ICU4J's Collator deals only with 2 decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode, java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, the Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

For more information about the collation service see the User Guide.

Examples of use

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }

 The following example shows how to compare two strings using the
 Collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(NO_DECOMPOSITION);
 if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
     System.out.println("à\u0325 is not equals to a\u0325̀ without decomposition");
     myCollator.setDecomposition(CANONICAL_DECOMPOSITION);
     if (myCollator.compare("à\u0325", "a\u0325̀") != 0) {
         System.out.println("Error: à\u0325 should be equals to a\u0325̀ with decomposition");
     }
     else {
         System.out.println("à\u0325 is equals to a\u0325̀ with decomposition");
     }
 }
 else {
     System.out.println("Error: à\u0325 should be not equals to a\u0325̀ without decomposition");
 }
 

See also:

Summary

Nested classes

interface Collator.ReorderCodes

Reordering codes for non-script groups that can be reordered under collation. 

Constants

int CANONICAL_DECOMPOSITION

Decomposition mode value.

int FULL_DECOMPOSITION

[icu] Note: This is for backwards compatibility with Java APIs only.

int IDENTICAL

Smallest Collator strength value.

int NO_DECOMPOSITION

Decomposition mode value.

int PRIMARY

Strongest collator strength value.

int QUATERNARY

[icu] Fourth level collator strength value.

int SECONDARY

Second level collator strength value.

int TERTIARY

Third level collator strength value.

Protected constructors

Collator()

Empty default constructor to make javadocs happy

Public methods

Object clone()

Clones the collator.

Collator cloneAsThawed()

Provides for the clone operation.

int compare(Object source, Object target)

Compares the source Object to the target Object.

abstract int compare(String source, String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode.

boolean equals(String source, String target)

Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode.

boolean equals(Object obj)

Compares the equality of two Collator objects.

Collator freeze()

Freezes the collator.

static Locale[] getAvailableLocales()

Returns the set of locales, as Locale objects, for which collators are installed.

static final ULocale[] getAvailableULocales()

[icu] Returns the set of locales, as ULocale objects, for which collators are installed.

abstract CollationKey getCollationKey(String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison.

int getDecomposition()

Returns the decomposition mode of this Collator.

static String getDisplayName(Locale objectLocale, Locale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

static String getDisplayName(ULocale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

static String getDisplayName(ULocale objectLocale, ULocale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

static String getDisplayName(Locale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

static int[] getEquivalentReorderCodes(int reorderCode)

Retrieves all the reorder codes that are grouped with the given reorder code.

static final ULocale getFunctionalEquivalent(String keyword, ULocale locID)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

static final ULocale getFunctionalEquivalent(String keyword, ULocale locID, boolean[] isAvailable)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

static final Collator getInstance()

Returns the Collator for the current default locale.

static final Collator getInstance(Locale locale)

Returns the Collator for the desired locale.

static final Collator getInstance(ULocale locale)

[icu] Returns the Collator for the desired locale.

static final String[] getKeywordValues(String keyword)

[icu] Given a keyword, returns an array of all values for that keyword that are currently in use.

static final String[] getKeywordValuesForLocale(String key, ULocale locale, boolean commonlyUsed)

[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference.

static final String[] getKeywords()

[icu] Returns an array of all possible keywords that are relevant to collation.

int getMaxVariable()

[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.

int[] getReorderCodes()

Retrieves the reordering codes for this collator.

int getStrength()

Returns this Collator's strength attribute.

UnicodeSet getTailoredSet()

[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator.

abstract VersionInfo getUCAVersion()

[icu] Returns the UCA version of this collator object.

abstract int getVariableTop()

[icu] Gets the variable top value of a Collator.

abstract VersionInfo getVersion()

[icu] Returns the version of this collator object.

boolean isFrozen()

Determines whether the object has been frozen or not.

void setDecomposition(int decomposition)

Sets the decomposition mode of this Collator.

Collator setMaxVariable(int group)

[icu] Sets the variable top to the top of the specified reordering group.

void setReorderCodes(int... order)

Sets the reordering codes for this collator.

void setStrength(int newStrength)

Sets this Collator's strength attribute.

Inherited methods

From class java.lang.Object
From interface java.util.Comparator
From interface android.icu.util.Freezable

Constants

CANONICAL_DECOMPOSITION

Added in API level 24
int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See also:

Constant Value: 17 (0x00000011)

FULL_DECOMPOSITION

Added in API level 24
int FULL_DECOMPOSITION

[icu] Note: This is for backwards compatibility with Java APIs only. It should not be used, IDENTICAL should be used instead. ICU's collation does not support Java's FULL_DECOMPOSITION mode.

Constant Value: 15 (0x0000000f)

IDENTICAL

Added in API level 24
int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

Constant Value: 15 (0x0000000f)

NO_DECOMPOSITION

Added in API level 24
int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See also:

Constant Value: 16 (0x00000010)

PRIMARY

Added in API level 24
int PRIMARY

Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.

See also:

Constant Value: 0 (0x00000000)

QUATERNARY

Added in API level 24
int QUATERNARY

[icu] Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuation in the User Guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.

See also:

Constant Value: 3 (0x00000003)

SECONDARY

Added in API level 24
int SECONDARY

Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.

See also:

Constant Value: 1 (0x00000001)

TERTIARY

Added in API level 24
int TERTIARY

Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

See also:

Constant Value: 2 (0x00000002)

Protected constructors

Collator

Added in API level 24
Collator ()

Empty default constructor to make javadocs happy

Public methods

clone

Added in API level 24
Object clone ()

Clones the collator.

Returns
Object a clone of this collator.
Throws
CloneNotSupportedException

cloneAsThawed

Added in API level 24
Collator cloneAsThawed ()

Provides for the clone operation. Any clone is initially unfrozen.

Returns
Collator

compare

Added in API level 24
int compare (Object source, 
                Object target)

Compares the source Object to the target Object.

Parameters
source Object: the source Object.
target Object: the target Object.
Returns
int Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws
ClassCastException thrown if either arguments cannot be cast to CharSequence.

compare

Added in API level 24
int compare (String source, 
                String target)

Compares the source text String to the target text String according to this Collator's rules, strength and decomposition mode. Returns an integer less than, equal to or greater than zero depending on whether the source String is less than, equal to or greater than the target String. See the Collator class description for an example of use.

Parameters
source String: the source String.
target String: the target String.
Returns
int Returns an integer value. Value is less than zero if source is less than target, value is zero if source and target are equal, value is greater than zero if source is greater than target.
Throws
NullPointerException thrown if either argument is null.

See also:

equals

Added in API level 24
boolean equals (String source, 
                String target)

Compares the equality of two text Strings using this Collator's rules, strength and decomposition mode. Convenience method.

Parameters
source String: the source string to be compared.
target String: the target string to be compared.
Returns
boolean true if the strings are equal according to the collation rules, otherwise false.
Throws
NullPointerException thrown if either arguments is null.

See also:

equals

Added in API level 24
boolean equals (Object obj)

Compares the equality of two Collator objects. Collator objects are equal if they have the same collation (sorting & searching) behavior.

The base class checks for null and for equal types. Subclasses should override.

Parameters
obj Object: the Collator to compare to.
Returns
boolean true if this Collator has exactly the same collation behavior as obj, false otherwise.

freeze

Added in API level 24
Collator freeze ()

Freezes the collator.

Returns
Collator the collator itself.

getAvailableLocales

Added in API level 24
Locale[] getAvailableLocales ()

Returns the set of locales, as Locale objects, for which collators are installed. Note that Locale objects do not support RFC 3066.

Returns
Locale[] the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

getAvailableULocales

Added in API level 24
ULocale[] getAvailableULocales ()

[icu] Returns the set of locales, as ULocale objects, for which collators are installed. ULocale objects support RFC 3066.

Returns
ULocale[] the list of locales in which collators are installed. This list includes any that have been registered, in addition to those that are installed with ICU4J.

getCollationKey

Added in API level 24
CollationKey getCollationKey (String source)

Transforms the String into a CollationKey suitable for efficient repeated comparison. The resulting key depends on the collator's rules, strength and decomposition mode.

Note that collation keys are often less efficient than simply doing comparison. For more details, see the ICU User Guide.

See the CollationKey class documentation for more information.

Parameters
source String: the string to be transformed into a CollationKey.
Returns
CollationKey the CollationKey for the given String based on this Collator's collation rules. If the source String is null, a null CollationKey is returned.

See also:

getDecomposition

Added in API level 24
int getDecomposition ()

Returns the decomposition mode of this Collator. The decomposition mode determines how Unicode composed characters are handled.

See the Collator class description for more details.

The base class method always returns NO_DECOMPOSITION. Subclasses should override it if appropriate.

Returns
int the decomposition mode

See also:

getDisplayName

Added in API level 24
String getDisplayName (Locale objectLocale, 
                Locale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

Parameters
objectLocale Locale: the locale of the collator
displayLocale Locale: the locale for the collator's display name
Returns
String the display name

getDisplayName

Added in API level 24
String getDisplayName (ULocale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

Parameters
objectLocale ULocale: the locale of the collator
Returns
String the display name

See also:

getDisplayName

Added in API level 24
String getDisplayName (ULocale objectLocale, 
                ULocale displayLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the displayLocale.

Parameters
objectLocale ULocale: the locale of the collator
displayLocale ULocale: the locale for the collator's display name
Returns
String the display name

getDisplayName

Added in API level 24
String getDisplayName (Locale objectLocale)

[icu] Returns the name of the collator for the objectLocale, localized for the default DISPLAY locale.

Parameters
objectLocale Locale: the locale of the collator
Returns
String the display name

See also:

getEquivalentReorderCodes

Added in API level 24
int[] getEquivalentReorderCodes (int reorderCode)

Retrieves all the reorder codes that are grouped with the given reorder code. Some reorder codes are grouped and must reorder together. Beginning with ICU 55, scripts only reorder together if they are primary-equal, for example Hiragana and Katakana.

Parameters
reorderCode int: The reorder code to determine equivalence for.
Returns
int[] the set of all reorder codes in the same group as the given reorder code.

See also:

getFunctionalEquivalent

Added in API level 24
ULocale getFunctionalEquivalent (String keyword, 
                ULocale locID)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service.

Parameters
keyword String: a particular keyword as enumerated by getKeywords.
locID ULocale: The requested locale
Returns
ULocale the locale

See also:

getFunctionalEquivalent

Added in API level 24
ULocale getFunctionalEquivalent (String keyword, 
                ULocale locID, 
                boolean[] isAvailable)

[icu] Returns the functionally equivalent locale for the given requested locale, with respect to given keyword, for the collation service. If two locales return the same result, then collators instantiated for these locales will behave equivalently. The converse is not always true; two collators may in fact be equivalent, but return different results, due to internal details. The return result has no other meaning than that stated above, and implies nothing as to the relationship between the two locales. This is intended for use by applications who wish to cache collators, or otherwise reuse collators when possible. The functional equivalent may change over time. For more information, please see the Locales and Services section of the ICU User Guide.

Parameters
keyword String: a particular keyword as enumerated by getKeywords.
locID ULocale: The requested locale
isAvailable boolean: If non-null, isAvailable[0] will receive and output boolean that indicates whether the requested locale was 'available' to the collation service. If non-null, isAvailable must have length >= 1.
Returns
ULocale the locale

getInstance

Added in API level 24
Collator getInstance ()

Returns the Collator for the current default locale. The default locale is determined by java.util.Locale.getDefault().

Returns
Collator the Collator for the default locale (for example, en_US) if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:

getInstance

Added in API level 24
Collator getInstance (Locale locale)

Returns the Collator for the desired locale.

For some languages, multiple collation types are available; for example, "de-u-co-phonebk". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper", only with ULocale) or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.

Parameters
locale Locale: the desired locale.
Returns
Collator Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:

getInstance

Added in API level 24
Collator getInstance (ULocale locale)

[icu] Returns the Collator for the desired locale.

For some languages, multiple collation types are available; for example, "de@collation=phonebook". Starting with ICU 54, collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.

Parameters
locale ULocale: the desired locale.
Returns
Collator Collator for the desired locale if it is created successfully. Otherwise if there is no Collator associated with the current locale, the root collator will be returned.

See also:

getKeywordValues

Added in API level 24
String[] getKeywordValues (String keyword)

[icu] Given a keyword, returns an array of all values for that keyword that are currently in use.

Parameters
keyword String: one of the keywords returned by getKeywords.
Returns
String[]

See also:

getKeywordValuesForLocale

Added in API level 24
String[] getKeywordValuesForLocale (String key, 
                ULocale locale, 
                boolean commonlyUsed)

[icu] Given a key and a locale, returns an array of string values in a preferred order that would make a difference. These are all and only those values where the open (creation) of the service with the locale formed from the input locale plus input keyword and that value has different behavior than creation with the input locale alone.

Parameters
key String: one of the keys supported by this service. For now, only "collation" is supported.
locale ULocale: the locale
commonlyUsed boolean: if set to true it will return only commonly used values with the given locale in preferred order. Otherwise, it will return all the available values for the locale.
Returns
String[] an array of string values for the given key and the locale.

getKeywords

Added in API level 24
String[] getKeywords ()

[icu] Returns an array of all possible keywords that are relevant to collation. At this point, the only recognized keyword for this service is "collation".

Returns
String[] an array of valid collation keywords.

See also:

getMaxVariable

Added in API level 24
int getMaxVariable ()

[icu] Returns the maximum reordering group whose characters are affected by the alternate handling behavior.

The base class implementation returns Collator.ReorderCodes.PUNCTUATION.

Returns
int the maximum variable reordering group.

See also:

getReorderCodes

Added in API level 24
int[] getReorderCodes ()

Retrieves the reordering codes for this collator. These reordering codes are a combination of UScript codes and ReorderCodes.

Returns
int[] a copy of the reordering codes for this collator; if none are set then returns an empty array

See also:

getStrength

Added in API level 24
int getStrength ()

Returns this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant. [icu] Note: This can return QUATERNARY strength, which is not supported by the JDK version.

See the Collator class description for more details.

The base class method always returns TERTIARY. Subclasses should override it if appropriate.

Returns
int this Collator's current strength attribute.

See also:

getTailoredSet

Added in API level 24
UnicodeSet getTailoredSet ()

[icu] Returns a UnicodeSet that contains all the characters and sequences tailored in this collator.

Returns
UnicodeSet a pointer to a UnicodeSet object containing all the code points and sequences that may sort differently than in the root collator.

getUCAVersion

Added in API level 24
VersionInfo getUCAVersion ()

[icu] Returns the UCA version of this collator object.

Returns
VersionInfo the version object associated with this collator

getVariableTop

Added in API level 24
int getVariableTop ()

[icu] Gets the variable top value of a Collator.

Returns
int the variable top primary weight

See also:

getVersion

Added in API level 24
VersionInfo getVersion ()

[icu] Returns the version of this collator object.

Returns
VersionInfo the version object associated with this collator

isFrozen

Added in API level 24
boolean isFrozen ()

Determines whether the object has been frozen or not.

An unfrozen Collator is mutable and not thread-safe. A frozen Collator is immutable and thread-safe.

Returns
boolean

setDecomposition

Added in API level 24
void setDecomposition (int decomposition)

Sets the decomposition mode of this Collator. Setting this decomposition attribute with CANONICAL_DECOMPOSITION allows the Collator to handle un-normalized text properly, producing the same results as if the text were normalized. If NO_DECOMPOSITION is set, it is the user's responsibility to insure that all text is already in the appropriate form before a comparison or before getting a CollationKey. Adjusting decomposition mode allows the user to select between faster and more complete collation behavior.

Since a great many of the world's languages do not require text normalization, most locales set NO_DECOMPOSITION as the default decomposition mode.

The base class method does nothing. Subclasses should override it if appropriate.

See getDecomposition for a description of decomposition mode.

Parameters
decomposition int: the new decomposition mode
Throws
IllegalArgumentException If the given value is not a valid decomposition mode.

See also:

setMaxVariable

Added in API level 24
Collator setMaxVariable (int group)

[icu] Sets the variable top to the top of the specified reordering group. The variable top determines the highest-sorting character which is affected by the alternate handling behavior. If that attribute is set to UCOL_NON_IGNORABLE, then the variable top has no effect.

The base class implementation throws an UnsupportedOperationException.

Parameters
group int: one of Collator.ReorderCodes.SPACE, Collator.ReorderCodes.PUNCTUATION, Collator.ReorderCodes.SYMBOL, Collator.ReorderCodes.CURRENCY; or Collator.ReorderCodes.DEFAULT to restore the default max variable group
Returns
Collator this

See also:

setReorderCodes

Added in API level 24
void setReorderCodes (int... order)

Sets the reordering codes for this collator. Collation reordering allows scripts and some other groups of characters to be moved relative to each other. This reordering is done on top of the DUCET/CLDR standard collation order. Reordering can specify groups to be placed at the start and/or the end of the collation order. These groups are specified using UScript codes and Collator.ReorderCodes entries.

By default, reordering codes specified for the start of the order are placed in the order given after several special non-script blocks. These special groups of characters are space, punctuation, symbol, currency, and digit. These special groups are represented with Collator.ReorderCodes entries. Script groups can be intermingled with these special non-script groups if those special groups are explicitly specified in the reordering.

The special code OTHERS stands for any script that is not explicitly mentioned in the list of reordering codes given. Anything that is after OTHERS will go at the very end of the reordering in the order given.

The special reorder code DEFAULT will reset the reordering for this collator to the default for this collator. The default reordering may be the DUCET/CLDR order or may be a reordering that was specified when this collator was created from resource data or from rules. The DEFAULT code must be the sole code supplied when it is used. If not, then an IllegalArgumentException will be thrown.

The special reorder code NONE will remove any reordering for this collator. The result of setting no reordering will be to have the DUCET/CLDR ordering used. The NONE code must be the sole code supplied when it is used.

Parameters
order int: the reordering codes to apply to this collator; if this is null or an empty array then this clears any existing reordering

See also:

setStrength

Added in API level 24
void setStrength (int newStrength)

Sets this Collator's strength attribute. The strength attribute determines the minimum level of difference considered significant during comparison.

The base class method does nothing. Subclasses should override it if appropriate.

See the Collator class description for an example of use.

Parameters
newStrength int: the new strength value.
Throws
IllegalArgumentException if the new strength value is not valid.

See also:

Hooray!