#P7641. [BalticOI 2006] RLE COMPRESSION (Day 2)

[BalticOI 2006] RLE COMPRESSION (Day 2)

Description

RLE is a simple compression algorithm used to compress sequences that contain consecutive repetitions of the same character. By compressing a given sequence, we obtain its encoding. The idea is to use a counter to represent repetitions of a given character (e.g. aaaaa\texttt{aaaaa}), indicating how many times it repeats. That is, we represent it using a triple consisting of a repetition marker, the repeated character, and an integer that represents the number of repetitions. For example, aaaaa\texttt{aaaaa} can be encoded as #a5\texttt{#a5} (where \texttt{#} is the repetition marker).

We need some way to represent letters, the repetition marker, and the counter. Let the alphabet consist of nn characters, represented by integers from the set Σ={0,1,,n1}\Sigma = \lbrace 0,1,\cdots,n-1 \rbrace. The encoding of a character sequence over Σ\Sigma is also a character sequence over Σ\Sigma. At any moment, the repetition marker is represented by a character from Σ\Sigma, denoted by ee. Initially, ee is 00, but it may change during the encoding.

The encoding is interpreted as follows:

  • Any character aa in the encoding, except for the repetition marker, represents itself.
  • If the repetition marker ee occurs in the encoding, then the next two characters have special meanings:
    • If ee is followed by ekek, it represents k+1k+1 repetitions of ee.
    • Otherwise, if ee is followed by b0b0 (where b=/eb {=}\mathllap{/\,} e), then bb becomes the repetition marker starting from that point.
    • Otherwise, if ee is followed by bkbk (where b=/eb {=}\mathllap{/\,} e and k>0k>0), it represents k+3k+3 repetitions of bb.

Using the scheme above, we can encode any character sequence over Σ\Sigma. For example, for n=4n=4, the sequence 1002222223333303020000\texttt{1002222223333303020000} can be encoded as 10010230320100302101\texttt{10010230320100302101}. The first character 1\texttt{1} in the encoding represents the plain character 1\texttt{1}. The next 001\texttt{001} encodes 00\texttt{00}. Then 023\texttt{023} represents 222222\texttt{222222}, 032\texttt{032} represents 33333\texttt{33333}, and 010\texttt{010} switches the repetition marker to 1\texttt{1}. After that, 0302\texttt{0302} represents itself, and finally 101\texttt{101} encodes 0000\texttt{0000}.

A sequence can be encoded in multiple ways, and the length of the encoding may vary. Given an already encoded sequence, your task is to find an encoding with the minimum number of characters.

Write a program that:

  • Reads the size of the alphabet and the encoding of the sequence.
  • Finds the shortest encoding of the sequence.
  • Outputs the result.

Input Format

The first line contains an integer nn, the size of the alphabet. The second line contains an integer mm, the length of the encoding. The last line contains mm integers from {0,1,,n1}\lbrace 0,1,\cdots,n−1 \rbrace, separated by single spaces, representing the encoding of the sequence.

Output Format

The first line should contain an integer mm', the minimum number of characters in an encoding that represents the given sequence. The last line of output should contain mm' integers from the set {0,1,,n1}\lbrace 0,1,\cdots,n−1 \rbrace, separated by single spaces, giving an encoding of the sequence. If there are several shortest encodings, the program may output any one of them.

4
20
1 0 0 1 0 2 3 0 3 2 0 1 0 0 3 0 2 1 0 1
19
1 0 1 0 0 0 1 2 3 1 3 2 0 3 0 2 1 0 1
14
15
10 10 10 0 10 0 10 10 13 10 10 13 10 10 13
9
0 10 13 0 10 13 0 10 10

Hint

Constraints

For 100%100\% of the testdata, 2n1052 \le n \le 10^5, 1m2×1061 \le m \le 2×10^6.

Notes

The problem is from Baltic Olympiad in Informatics 2006, Day 2: RLE. Translated and整理 (pinyin: zhengli) by

/user/274209

Translated by ChatGPT 5