CLV. Unicode Functions

Introduction

Unicode Support.

Warning

This extension is still in development and it isn't available to public yet.

Requirements

ICU 3.4 or later is required.

Installation

First you should download and install ICU:

Example 1. install ICU on Unix

./configure --disable-threads --enable-extras --enable-icuio --enable-layout
make && make install

Then checkout latest PHP and configure it --with-icu-dir=<dir> option, where <dir> was the dir to where you installed ICU. You don't need to explicitly use this option if you install ICU to a standard location.

Runtime Configuration

The behaviour of these functions is affected by settings in php.ini.

Table 1. Unicode Configuration Options

NameDefaultChangeableChangelog
unicode.fallback_encodingNULLPHP_INI_ALLAvailable since PHP 6.0.0.
unicode.from_error_mode2PHP_INI_ALLAvailable since PHP 6.0.0.
unicode.from_error_subst_char"3f"PHP_INI_ALLAvailable since PHP 6.0.0.
unicode.http_input_encodingNULLPHP_INI_ALLAvailable since PHP 6.0.0.
unicode.output_encodingNULLPHP_INI_ALLAvailable since PHP 6.0.0.
unicode.runtime_encodingNULLPHP_INI_ALLAvailable since PHP 6.0.0.
unicode.script_encodingNULLPHP_INI_ALLAvailable since PHP 6.0.0.
unicode.semanticsoffPHP_INI_PERDIRAvailable since PHP 6.0.0.
For further details and definitions of the PHP_INI_* constants, see the Appendix G.

Here's a short explanation of the configuration directives.

unicode.output_encoding string

Default encoding for output.

Predefined Constants

The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.

Table 2.

constantvaluedescription
U_INVALID_STOP0stop at first invalid character
U_INVALID_SKIP1skip invalid characters
U_INVALID_SUBSTITUTE2replace invalid characters
U_INVALID_ESCAPE3escape invalid characters

Table of Contents
i18n_loc_get_default -- Get the default Locale
i18n_loc_set_default -- Set the default Locale
unicode_encode -- Takes a unicode string and converts it to a string in the specified encoding