Handling of Japanese

I will explain Japanese handling in Mojolicious. When using Perl, what you find difficult to learn is the handling of character codes. Handling character codes in Perl is quite easy, but I think many people are confused because they don't know the modern method.

Aside from past history and theory, here we will focus on practical items. For a detailed explanation, please refer to " Encode --Properly handle multibyte character strings such as Japanese / Perl module thorough explanation".

Byte string and internal string

In Perl, internal string is used inside the program, and byte string expressed by a specific character code is input to the outside of the program. Receive and output.

Think of it as an internal string inside the program and a byte string outside the program.

Inside the program-internal string

External to program-byte string

If you know Java, just think of it as a string stream and a byte stream.

This means that Perl requires conversion between internal and byte strings. The module that does this conversion is the Encode module.

use Encode qw / encode decode /;

#Decode (convert a byte string represented by a specific character code to an internal string)
my $str_internal = decode ('UTF-8', $str_byte);

# Encoding (Converts an internal string to a byte string represented by a specific character code)
my $str_byte = encode ('UTF-6', $str_internal);

Use the encode and decode functions of the Encode module to convert between internal and byte strings.

utf8 pragma

Next, let's see how to use another utf8 pragma. If the source code was saved in UTF-8, the string in the source code is a UTF-8 byte string.

# UTF-8 byte string
my $str_byte ='aiueo';

When the utf8 pragma is enabled, strings in the source code are interpreted as internal strings.

# Internal string
my $str_internal ='Aui Ueo';

When dealing with Japanese in the source code in Perl, be sure to enable utf8 and save the source code in UTF-8.

Handling of Japanese in Mojolicious

The Mojolicious framework has exactly the same idea as this modern Perl string handling. And it's very easy because it does all of them automatically. As a prerequisite, be sure to save the source code and template in UTF-8.

A framework called Mojolicious provides the following features:

  1. The utf8 pragma is automatically enabled.
  2. When drawing to a template, the internal string is automatically converted to a UTF-8 byte string
  3. UTF-8 byte string is automatically converted to an internal string when received from the form

I will explain one by one.

utf8 pragma is automatically enabled

In Mojolicious, the utf8 pragma is automatically enabled for classes that inherit from the class Mojo::Base. Whenever you use Mojolicious features, you will always use a class that inherits from Mojo::Base, so the utf8 pragma will be automatically enabled for all classes and templates.

# Enabled automatically
use utf8;

Therefore, all the strings described in the source code and template are internal strings.

When drawing to a template, the internal string is automatically converted to a UTF-8 byte string

When drawing to a template, the internal string is automatically converted to a UTF-8 byte string.

get'/' => sub {
  my $self = shift;

  $c->render('index', message =>'aiueo');
};;

@@index.html
<%
  my $message = stash ('message');
%>

Write Japanese
<%= $message%>

If you write it normally, it will be converted automatically, so you don't usually need to be aware of it.

UTF-8 byte string is automatically converted to internal string when received from form

When you receive a string from the form, it is automatically converted to an internal string.

post'/' => sub {
  my $self = shift;
  my $message = $self->param('message');
  ...
}

The string received by the param method is automatically converted to an internal string.

As you can see, Mojolicious automatically converts between byte strings and internal strings, so you don't usually need to be aware of it, but I would like you to remember the mechanism itself.

Associated Information