#P6276. [USACO20OPEN] Exercise P

[USACO20OPEN] Exercise P

Description

Farmer John has (again) come up with a new morning exercise plan for the cows!
As before, Farmer John’s NN cows stand in a line. For each 1iN1\le i\le N, the ii-th cow from left to right has ID ii. He tells them to repeat the following step until the cows are in the same order as they started.

Given a permutation AA of length NN, the cows change their order so that the cow that was the ii-th from left to right before the change becomes the AiA_i-th from left to right after the change.
For example, if A=(1,2,3,4,5)A=(1,2,3,4,5), then the cows return to the same order after a total of one step. If A=(2,3,1,5,4)A=(2,3,1,5,4), then the cows return to the starting order after a total of six steps. After each step, the order of the cows from left to right is as follows:

Step 0: (1,2,3,4,5)(1,2,3,4,5)
Step 1: (3,1,2,5,4)(3,1,2,5,4)
Step 2: (2,3,1,4,5)(2,3,1,4,5)
Step 3: (1,2,3,5,4)(1,2,3,5,4)
Step 4: (3,1,2,4,5)(3,1,2,4,5)
Step 5: (2,3,1,5,4)(2,3,1,5,4)
Step 6: (1,2,3,4,5)(1,2,3,4,5)

Please compute the product, over all N!N! permutations AA of length NN, of the number of steps needed to return to the starting order.

Since this number can be very large, output the remainder modulo MM (108M109+710^8\le M\le 10^9+7, and MM is prime).


C++ users may use the following code from KACTL. This algorithm, called Barrett reduction, can compute a%ba \% b several times faster than the usual method, where b>1b>1 is a constant unknown at compile time. (Unfortunately, we were unable to find a similar optimization for Java.) (Translator’s note: Chinese users may refer to 几种取模优化方法(译自 min-25 的博客).)

#include <bits/stdc++.h>
using namespace std;

typedef unsigned long long ull;
typedef __uint128_t L;
struct FastMod {
    ull b, m;
    FastMod(ull b) : b(b), m(ull((L(1) << 64) / b)) {}
    ull reduce(ull a) {
        ull q = (ull)((L(m) * a) >> 64);
        ull r = a - q * b; // can be proven that 0 <= r < 2*b
        return r >= b ? r - b : r;
    }
};
FastMod F(2);

int main() {
    int M = 1000000007; F = FastMod(M);
    ull x = 10ULL*M+3; 
    cout << x << " " << F.reduce(x) << "\n"; // 10000000073 3
}

Input Format

The input contains one line with NN and MM.

Output Format

Output one integer.

5 1000000007
369329541

Hint

Sample Explanation:

For each 1iN1\le i\le N, the ii-th element of the following sequence equals the number of permutations for which the cows need ii steps: [1,25,20,30,24,20][1,25,20,30,24,20]. Therefore, the answer is $1^1\cdot 2^{25}\cdot 3^{20}\cdot 4^{30}\cdot 5^{24}\cdot 6^{20}\equiv 369329541\pmod{10^9+7}$.

Note: The memory limit for this problem is increased to 512 MB.


For 100%100\% of the testdata, 1N75001\le N\le 7500.

There are 1616 test points. Test point 11 is the sample, and the others have the following properties:

Test point 22 satisfies N=8N=8.
Test points 353\sim 5 satisfy N50N\le 50.
Test points 686\sim 8 satisfy N500N\le 500.
Test points 9129\sim 12 satisfy N3000N\le 3000.
Test points 131613\sim 16 have no additional constraints.


Problem setter: Benjamin Qi

Translated by ChatGPT 5