FrontierCS

Treasure Packing

<p>A dragon has terrorized the region for decades, breathing fire and stealing treasures. A hero has decided to steal back the treasures and return them to the community. However, they drastically underestimated the size of the dragon's hoard, bringing only a <strong>single 25-liter bag</strong>, only strong enough to hold <strong>20 kg</strong>, to carry out the treasures.</p> <p>The hero has decided to try to return <strong>as much value as possible</strong>. The dragon has <strong>12 different types</strong> of items of different values, masses, and volumes: crowns, figurines, jeweled goblets, gold helms, etc.</p> <strong>Important:</strong> Each item of the same type has the same mass and volume. For example, all goblets have the same mass and volume. <h2>Your Task</h2> <p>Write a program to help the hero determine what items they should put in the bag to <strong>maximize value</strong> while accounting for the constraints on the bag.</p> <h3>Constraints</h3> <ul> <li><strong>Weight limit:</strong> 20 kg (20,000,000 mg)</li> <li><strong>Volume limit:</strong> 25 liters (25,000,000 µliters)</li> </ul> <h2>Input Format</h2> <p>The input is <strong>JSON</strong> keyed by treasure category name (e.g., <code>"goblet"</code>).</p> <ul> <li>Every category has a <strong>unique name</strong> composed of at most 100 lowercase ASCII characters</li> <li>There are <strong>exactly twelve</strong> treasure categories</li> </ul> <p>Each corresponding value is a list with:</p> <p>1. <strong>q</strong> (1 �?q �?10,000): Maximum number of treasures of this category that can be looted 2. <strong>v</strong> (0 < v �?10�?: Value of one item 3. <strong>m</strong> (0 < m �?20×10�?: Mass in <strong>mg</strong> of one item (1 million mg per kg) 4. <strong>l</strong> (0 < l �?25×10�?: Volume in <strong>µliters</strong> of one item (1 million µliters per liter)</p> <h3>Sample Input</h3> <pre><code class="language-json">{ "circlet": [19, 113005, 146800, 514247], "coppercoin": [887, 6171, 12593, 18081], "crown": [13, 726439, 1079353, 1212213], "dagger": [21, 279513, 558367, 522344], "figurine": [18, 26272, 1488281, 2295986], "goblet": [22, 300053, 698590, 986387], "goldcoin": [129, 14426, 82176, 27597], "helm": [3, 974983, 2470209, 882803], "jewelry": [23, 272450, 171396, 262226], "plate": [22, 98881, 246257, 363420], "silvercoin": [288, 12587, 53480, 28654], "torc": [17, 244957, 388222, 500000] }</code></pre> <h2>Output Format</h2> <p>Print a <strong>JSON</strong> with the same keys as the input, but with <strong>single nonnegative integer values</strong> giving the numbers of items of that category added to the bag in your solution.</p> <h3>Sample Output</h3> <pre><code class="language-json">{ "circlet": 2, "coppercoin": 8, "crown": 13, "dagger": 0, "figurine": 0, "goblet": 0, "goldcoin": 0, "helm": 0, "jewelry": 23, "plate": 0, "silvercoin": 1, "torc": 4 }</code></pre> <h2>Scoring</h2> <p>Producing an algorithm that always generates optimal solutions is very difficult, so your solution only needs to be <strong>"good enough"</strong>.</p> <p>We will compare your output to: 1. A <strong>baseline heuristic</strong> provided by the NSA 2. A <strong>best effort</strong> to compute the true optimum</p> <h3>Score Calculation</h3> <p>Your score on this problem is the <strong>average of the per-test-case score</strong>:</p> <pre><code class="language-">Score = 100 × clamp((your_value - baseline_value) / (best_value - baseline_value), 0, 1)</code></pre> Where: <ul> <li><strong>your_value</strong>: Total value of treasures in your solution</li> <li><strong>baseline_value</strong>: Total value from the NSA heuristic</li> <li><strong>best_value</strong>: Best known or computed optimal value</li> </ul> <strong>Requirements:</strong> <ul> <li>You <strong>must beat the NSA heuristic</strong> to receive points</li> <li>The better your solution compared to the optimum, the higher your score</li> </ul> <h2>Problem Type</h2> <p>This is a <strong>variant of the 0/1 Knapsack Problem</strong> with two constraints (weight and volume) instead of one. This is also known as the <strong>Multidimensional Knapsack Problem</strong>.</p> <h3>Key Differences from Classic Knapsack:</h3> <ul> <li><strong>Dual constraints:</strong> Must satisfy both weight AND volume limits</li> <li><strong>Multiple copies:</strong> Can take multiple items of the same type (up to q items)</li> <li><strong>Approximate solutions:</strong> Getting close to optimal is sufficient for a good score</li> </ul> <hr> <h2>Problem Metadata</h2> <strong>Time Limit:</strong> 1 second <strong>Memory Limit:</strong> 1024 MB <hr> <h2>Model Information</h2> <p><strong>Model:</strong> GPT 5 Thinking</p> <p><strong>Score:</strong> 94.0</p> <h3>Solution Code</h3> <pre><code>#include <bits/stdc++.h> using namespace std; struct Item { string name; long long q, v, m, l; }; struct Solution { vector<long long> x; long long totalValue = 0; long long totalM = 0; long long totalL = 0; }; struct Parser { string s; size_t pos = 0; Parser(const string& str) : s(str), pos(0) {} void skip() { while (pos < s.size() && (s[pos] == ' ' || s[pos] == '\n' || s[pos] == '\t' || s[pos] == '\r')) pos++; } bool expect(char c) { skip(); if (pos < s.size() && s[pos] == c) { pos++; return true; } return false; } string parseString() { skip(); if (pos >= s.size() || s[pos] != '"') return ""; pos++; string res; while (pos < s.size() && s[pos] != '"') { // no escape handling needed for given constraints (lowercase ascii) res.push_back(s[pos++]); } if (pos < s.size() && s[pos] == '"') pos++; return res; } long long parseInt() { skip(); bool neg = false; if (pos < s.size() && s[pos] == '-') { neg = true; pos++; } long long val = 0; while (pos < s.size() && s[pos] >= '0' && s[pos] <= '9') { val = val * 10 + (s[pos] - '0'); pos++; } return neg ? -val : val; } vector<Item> parseItems() { vector<Item> items; skip(); if (!expect('{')) return items; while (true) { skip(); if (pos < s.size() && s[pos] == '}') { pos++; break; } string key = parseString(); skip(); expect(':'); skip(); expect('['); long long q = parseInt(); skip(); expect(','); long long v = parseInt(); skip(); expect(','); long long m = parseInt(); skip(); expect(','); long long l = parseInt(); skip(); expect(']'); items.push_back({key, q, v, m, l}); skip(); if (pos < s.size() && s[pos] == ',') { pos++; continue; } skip(); if (pos < s.size() && s[pos] == '}') { pos++; break; } } return items; } }; static inline long long ceil_div(long long a, long long b) { if (b <= 0) return LLONG_MAX/4; if (a <= 0) return 0; return (a + b - 1) / b; } struct Metric { long double lam; long double mu; int mode; // 0 = linear combo; 1 = max normalized }; int main() { ios::sync_with_stdio(false); cin.tie(nullptr); // Read entire input string input, line; stringstream buffer; while (getline(cin, line)) { buffer << line << "\n"; } input = buffer.str(); Parser parser(input); vector<Item> items = parser.parseItems(); int n = (int)items.size(); if (n == 0) { // Fallback: print empty JSON cout << "{\n"; for (int i = 0; i < n; i++) { cout << " \"" << items[i].name << "\": 0"; if (i + 1 < n) cout << ",\n"; else cout << "\n"; } cout << "}\n"; return 0; } const long long Mcap = 20LL * 1000 * 1000; const long long Lcap = 25LL * 1000 * 1000; auto now = [](){ return chrono::steady_clock::now(); }; auto start = now(); auto timePassedMs = [&](double ms){ return chrono::duration<double, milli>(now() - start).count() >= ms; }; // Helper to compute solution totals auto computeTotals = [&](Solution& sol) { sol.totalValue = 0; sol.totalM = 0; sol.totalL = 0; for (int i = 0; i < n; i++) { if (sol.x[i] < 0) sol.x[i] = 0; if (sol.x[i] > items[i].q) sol.x[i] = items[i].q; sol.totalValue += sol.x[i] * items[i].v; sol.totalM += sol.x[i] * items[i].m; sol.totalL += sol.x[i] * items[i].l; } }; // Feasibility check auto isFeasible = [&](const Solution& sol)->bool{ if (sol.totalM > Mcap || sol.totalL > Lcap) return false; for (int i = 0; i < n; i++) if (sol.x[i] < 0 || sol.x[i] > items[i].q) return false; return true; }; // Greedy by metric auto greedyByMetric = [&](const Metric& metric)->Solution { vector<int> order(n); iota(order.begin(), order.end(), 0); vector<long double> dens(n, 0.0L); for (int i = 0; i < n; i++) { long double nm = (long double)items[i].m / (long double)Mcap; long double nl = (long double)items[i].l / (long double)Lcap; long double denom; if (metric.mode == 1) { denom = max(nm, nl); } else { denom = metric.lam * nm + metric.mu * nl; } if (denom <= 0) { dens[i] = 1e300L; // extremely high density } else { dens[i] = (long double)items[i].v / denom; } } stable_sort(order.begin(), order.end(), [&](int a, int b){ if (dens[a] == dens[b]) return items[a].v > items[b].v; return dens[a] > dens[b]; }); Solution sol; sol.x.assign(n, 0); long long remM = Mcap, remL = Lcap; for (int idx : order) { if (items[idx].m > remM || items[idx].l > remL) continue; long long kM = items[idx].m > 0 ? remM / items[idx].m : (long long)4e18; long long kL = items[idx].l > 0 ? remL / items[idx].l : (long long)4e18; long long k = min({ (long long)items[idx].q, kM, kL }); if (k <= 0) continue; sol.x[idx] += k; remM -= k * items[idx].m; remL -= k * items[idx].l; // Early break if bag is "full" for both dimensions (cannot add any item) } computeTotals(sol); return sol; }; // Single item greedy (only one type) auto singleTypeGreedy = [&](int idx)->Solution { Solution sol; sol.x.assign(n, 0); long long kM = items[idx].m > 0 ? Mcap / items[idx].m : (long long)4e18; long long kL = items[idx].l > 0 ? Lcap / items[idx].l : (long long)4e18; long long k = min({ (long long)items[idx].q, kM, kL }); if (k > 0) { sol.x[idx] = k; } computeTotals(sol); return sol; }; // Try to add one item j by possibly removing others greedily; returns true if improved auto tryAddImprove = [&](Solution& sol, int j)->bool { if (sol.x[j] >= items[j].q) return false; long long remM = Mcap - sol.totalM; long long remL = Lcap - sol.totalL; if (items[j].m <= remM && items[j].l <= remL) { // Always beneficial since v > 0 sol.x[j] += 1; sol.totalM += items[j].m; sol.totalL += items[j].l; sol.totalValue += items[j].v; return true; } // Need to remove items to fit j long long needM0 = max(0LL, items[j].m - remM); long long needL0 = max(0LL, items[j].l - remL); if (needM0 == 0 && needL0 == 0) { // Shouldn't get here due to first check return false; } auto attemptRemoval = [&](double mode)->bool { // mode 0: weights proportional to deficits; mode 1: binary 0/1 long double a, b; if (mode < 0.5) { a = (long double)needM0 / (long double)Mcap; b = (long double)needL0 / (long double)Lcap; } else { a = needM0 > 0 ? 1.0L : 0.0L; b = needL0 > 0 ? 1.0L : 0.0L; } vector<long long> remCnt(n, 0); long long removedVal = 0; long long curRemM = remM; long long curRemL = remL; long long needM = needM0; long long needL = needL0; int guard = 0; while ((needM > 0 || needL > 0) && removedVal < items[j].v) { int best = -1; long double bestScore = 1e300L; for (int i = 0; i < n; i++) { if (i == j) continue; long long avail = sol.x[i] - remCnt[i]; if (avail <= 0) continue; if (needM > 0 || needL > 0) { // Only consider those freeing at least some needed resource bool useful = false; if (needM > 0 && items[i].m > 0) useful = true; if (needL > 0 && items[i].l > 0) useful = true; if (!useful) continue; } long double denom = a * (long double)items[i].m + b * (long double)items[i].l; if (denom <= 0) continue; long double s = (long double)items[i].v / denom; // Respect value bound: must allow removing at least 1 while still potentially improving long long maxRemByVal = (items[j].v - 1 - removedVal) / items[i].v; if (maxRemByVal <= 0) continue; if (s < bestScore) { bestScore = s; best = i; } } if (best == -1) { // No removable items without losing more value than gain or not useful break; } // Remove in batch long long maxRemByVal = (items[j].v - 1 - removedVal) / items[best].v; long long avail = sol.x[best] - remCnt[best]; if (avail <= 0 || maxRemByVal <= 0) break; long long kNeedM = needM > 0 ? ceil_div(needM, items[best].m) : 0; long long kNeedL = needL > 0 ? ceil_div(needL, items[best].l) : 0; long long kNeed = max(kNeedM, kNeedL); if (kNeed <= 0) kNeed = 1; long long k = min({avail, maxRemByVal, kNeed}); if (k <= 0) k = 1; // Apply removal remCnt[best] += k; removedVal += k * items[best].v; curRemM += k * items[best].m; curRemL += k * items[best].l; needM = max(0LL, items[j].m - curRemM); needL = max(0LL, items[j].l - curRemL); guard++; if (guard > 100000) break; // safety } if (needM > 0 || needL > 0) return false; if (removedVal >= items[j].v) return false; // Commit changes for (int i = 0; i < n; i++) { if (remCnt[i] > 0) { sol.x[i] -= remCnt[i]; sol.totalM -= remCnt[i] * items[i].m; sol.totalL -= remCnt[i] * items[i].l; sol.totalValue -= remCnt[i] * items[i].v; } } // Now add j sol.x[j] += 1; sol.totalM += items[j].m; sol.totalL += items[j].l; sol.totalValue += items[j].v; return true; }; if (attemptRemoval(0.0)) return true; if (attemptRemoval(1.0)) return true; return false; }; auto localImprove = [&](Solution& sol) { // Try several passes, attempt to add items with removal // Order by descending value density using equal weights vector<int> order(n); iota(order.begin(), order.end(), 0); vector<long double> density(n, 0.0L); for (int i = 0; i < n; i++) { long double nm = (long double)items[i].m / (long double)Mcap; long double nl = (long double)items[i].l / (long double)Lcap; long double denom = nm + nl; if (denom <= 0) density[i] = 1e300L; else density[i] = (long double)items[i].v / denom; } stable_sort(order.begin(), order.end(), [&](int a, int b){ if (density[a] == density[b]) return items[a].v > items[b].v; return density[a] > density[b]; }); bool improved = true; int passes = 0; while (improved && passes < 4 && !timePassedMs(900.0)) { improved = false; for (int idx = 0; idx < n && !timePassedMs(930.0); idx++) { int j = order[idx]; // Try to add as many as possible with improvement int innerGuard = 0; while (!timePassedMs(950.0)) { bool ok = tryAddImprove(sol, j); if (!ok) break; improved = true; innerGuard++; if (innerGuard > 10000) break; // safety } } passes++; } }; // Generate metrics vector<Metric> metrics; vector<long double> ws = {0.0L, 0.1L, 0.25L, 0.5L, 1.0L, 2.0L, 4.0L}; for (auto lam : ws) { for (auto mu : ws) { if (lam == 0.0L && mu == 0.0L) continue; metrics.push_back({lam, mu, 0}); } } // Special max-normalized metric metrics.push_back({1.0L, 1.0L, 1}); // Additional skewed metrics metrics.push_back({1.0L, 0.0L, 0}); metrics.push_back({0.0L, 1.0L, 0}); metrics.push_back({1.0L, 3.0L, 0}); metrics.push_back({3.0L, 1.0L, 0}); metrics.push_back({1.0L, 5.0L, 0}); metrics.push_back({5.0L, 1.0L, 0}); // Random metrics mt19937_64 rng((uint64_t)chrono::high_resolution_clock::now().time_since_epoch().count()); uniform_real_distribution<long double> dist(0.05L, 5.0L); for (int r = 0; r < 16; r++) { long double lam = dist(rng); long double mu = dist(rng); metrics.push_back({lam, mu, 0}); } // Evaluate candidates Solution best; best.x.assign(n, 0); computeTotals(best); // zero solution // Single-type baseline candidates for (int i = 0; i < n; i++) { Solution s = singleTypeGreedy(i); if (isFeasible(s) && s.totalValue > best.totalValue) best = s; if (timePassedMs(100.0)) ; // no-op, just force possible branch prediction } // Metrics-based greedy + improvement for (size_t mi = 0; mi < metrics.size(); mi++) { if (timePassedMs(980.0)) break; Solution s = greedyByMetric(metrics[mi]); if (isFeasible(s)) { if (!timePassedMs(960.0)) localImprove(s); if (s.totalValue > best.totalValue) best = s; } } // Final safety feasibility if (!isFeasible(best)) { // Attempt to repair by dropping items with worst value density until feasible // Compute density with equal weights vector<pair<long double,int>> dens; for (int i = 0; i < n; i++) { long double nm = (long double)items[i].m / (long double)Mcap; long double nl = (long double)items[i].l / (long double)Lcap; long double denom = nm + nl; long double d = denom > 0 ? (long double)items[i].v / denom : 1e300L; dens.push_back({d, i}); } sort(dens.begin(), dens.end(), [](auto& A, auto& B){ return A.first < B.first; }); // remove worst first for (auto &p : dens) { int i = p.second; while ((best.totalM > Mcap || best.totalL > Lcap) && best.x[i] > 0) { best.x[i]--; best.totalM -= items[i].m; best.totalL -= items[i].l; best.totalValue -= items[i].v; } if (best.totalM <= Mcap && best.totalL <= Lcap) break; } if (!isFeasible(best)) { // As last resort, empty solution best.x.assign(n, 0); computeTotals(best); } } // Output JSON with same keys and order as input cout << "{\n"; for (int i = 0; i < n; i++) { cout << " \"" << items[i].name << "\": " << best.x[i]; if (i + 1 < n) cout << ",\n"; else cout << "\n"; } cout << "}\n"; return 0; }</code></pre>

Permutation Guess

Algorithmic

<p>There is a <strong>hidden permutation</strong> of n integers. Recall that a permutation of n is a sequence where each integer from <strong>1 to n</strong> (both inclusive) appears <strong>exactly once</strong>.</p> <p>Your task is to unravel the permutation using queries.</p> <h3>Query Mechanism</h3> Each query consists of: <ul> <li>A sequence (not necessarily a permutation) with <strong>n integers</strong> ranging from 1 to n (both inclusive)</li> </ul> Each query is answered with: <ul> <li>An integer <strong>x</strong>, indicating the <strong>number of positions</strong> where the corresponding element in your query sequence matches that of the hidden permutation</li> </ul> <h3>Example</h3> If the hidden permutation is <code>{1, 3, 4, 2, 5}</code> and your query is <code>{2, 3, 5, 2, 5}</code>: <ul> <li>Position 1: 1 �?2 �?/li> <li>Position 2: 3 = 3 �?/li> <li>Position 3: 4 �?5 �?/li> <li>Position 4: 2 = 2 �?/li> <li>Position 5: 5 = 5 �?/li> </ul> <p>The answer will be <strong>3</strong> (three positions match).</p> <h2>Input</h2> <p>There is only one test case in each test file.</p> <p>The first line contains an integer <strong>n</strong> (1 �?n �?10³) indicating the length of the hidden permutation.</p> <h2>Interaction Protocol</h2> <h3>To Ask a Query</h3> Output one line in the following format: <pre><code class="language-">0 a[1] a[2] ... a[n]</code></pre> Where: <ul> <li>First output <code>0</code> followed by a space</li> <li>Then print a sequence of n integers ranging from 1 to n, separated by spaces</li> <li><strong>After flushing your output</strong>, read a single integer x indicating the answer to your query</li> </ul> <h3>To Submit Your Guess</h3> Output one line in the following format: <pre><code class="language-">1 p[1] p[2] ... p[n]</code></pre> Where: <ul> <li>First output <code>1</code> followed by a space</li> <li>Then print your guessed permutation of n, separated by spaces</li> <li><strong>After flushing your output, exit immediately</strong></li> </ul> <h3>Important Notes</h3> <ul> <li>The answer for each test case is <strong>pre-determined</strong> (the interactor is not adaptive)</li> <li>Your final guess <strong>does not count</strong> as a query</li> <li>Invalid queries or exceeding the query limit may result in a Time Limit Exceeded verdict</li> </ul> <h3>Flushing Output</h3> <p>To flush your output, use:</p> <table> <thead> <tr> <th>Language</th> <th>Flush Command</th> </tr> </thead> <tbody> <tr> <td>C (printf)</td> <td><code>fflush(stdout)</code></td> </tr> <tr> <td>C++ (cout)</td> <td><code>cout.flush()</code></td> </tr> <tr> <td>Java</td> <td><code>System.out.flush()</code></td> </tr> <tr> <td>Python</td> <td><code>stdout.flush()</code></td> </tr> </tbody> </table> <h2>Scoring</h2> <p>This problem is graded based on the <strong>number of queries</strong> you use.</p> <h3>Requirements</h3> <ul> <li>You <strong>must use fewer than 10,000 queries</strong> to receive any points</li> <li>After that, your answer will be compared to a reference solution (<code>best_queries</code>)</li> </ul> <h3>Score Calculation</h3> <p>Your final score is calculated as the <strong>average</strong> across all test cases:</p> <pre><code class="language-">Score = 100 × clamp((10000 - your_queries) / (10000 - best_queries), 0, 1)</code></pre> Where: <ul> <li><strong>your_queries</strong>: Number of queries you used</li> <li><strong>best_queries</strong>: Number of queries in the reference solution</li> <li>Lower query count = higher score</li> </ul> <h2>Example Interaction</h2> <strong>Note:</strong> In the example below, <code>n = 5</code> and the hidden permutation is <code>{3, 1, 5, 2, 4}</code>. <table> <thead> <tr> <th>Your Output</th> <th>Interactor Response</th> <th>Explanation</th> </tr> </thead> <tbody> <tr> <td><code>0 3 1 3 2 2</code></td> <td><code>3</code></td> <td>Matches at positions 1, 2, and 4</td> </tr> <tr> <td><code>0 3 1 5 2 2</code></td> <td><code>4</code></td> <td>Matches at positions 1, 2, 3, and 4</td> </tr> <tr> <td><code>0 3 5 4 4 4</code></td> <td><code>2</code></td> <td>Matches at positions 1 and 5</td> </tr> <tr> <td><code>1 3 1 5 2 4</code></td> <td>�?/td> <td>Final guess submitted</td> </tr> </tbody> </table> <h3>Example Analysis</h3> <p>From the queries: 1. Query <code>{3, 1, 3, 2, 2}</code> �?3 matches 2. Query <code>{3, 1, 5, 2, 2}</code> �?4 matches (one more match when position 3 is 5) 3. Query <code>{3, 5, 4, 4, 4}</code> �?2 matches (positions 1 and 5)</p> <p>From these queries, we can deduce the hidden permutation is <code>{3, 1, 5, 2, 4}</code>.</p> <h2>Strategy Tips</h2> <p>1. <strong>Start simple</strong>: Try systematic approaches like checking one position at a time 2. <strong>Use information wisely</strong>: Each query gives you information about multiple positions 3. <strong>Binary search</strong>: Consider strategies that eliminate multiple possibilities per query 4. <strong>Optimize</strong>: Try to minimize queries while maintaining correctness 5. <strong>Pre-determination</strong>: Since the permutation is pre-determined, you can use deterministic strategies</p> <hr> <h2>Problem Metadata</h2> <strong>Type:</strong> Interactive Problem <strong>Time Limit:</strong> 4 seconds <strong>Memory Limit:</strong> 1024 MB <hr> <h2>Model Information</h2> <p><strong>Model:</strong> GPT 5 Thinking</p> <p><strong>Score:</strong> 0.0</p> <h3>Solution Code</h3> <pre><code>#include <bits/stdc++.h> using namespace std; int main() { ios::sync_with_stdio(false); cin.tie(nullptr); int n; if (!(cin >> n)) return 0; auto ask = [&](const vector<int>& s) -> int { cout << 0; for (int x : s) { cout << ' ' << x; } cout << '\n' << flush; int x; if (!(cin >> x)) exit(0); if (x == -1) exit(0); return x; }; auto give_answer = [&](const vector<int>& perm) { cout << 1; for (int i = 0; i < (int)perm.size(); ++i) { cout << ' ' << perm[i]; } cout << '\n' << flush; // exit immediately }; if (n == 1) { give_answer(vector<int>{1}); return 0; } int m = 0; while ((1 << m) < n) ++m; int b = 1, c = 2; // Precompute masks for each bit vector<vector<char>> mask(m, vector<char>(n, 0)); vector<int> sizeT(m, 0); for (int t = 0; t < m; ++t) { int cnt = 0; for (int j = 0; j < n; ++j) { int in = ((j >> t) & 1); mask[t][j] = (char)in; if (in) ++cnt; } sizeT[t] = cnt; } vector<int> pos_bits(n + 1, 0); // stores value of (pos[v]-1), built bit by bit vector<int> s(n); vector<int> delta(n + 1); for (int t = 0; t < m; ++t) { // Query r2: 'b' on T, 'c' elsewhere for (int j = 0; j < n; ++j) { s[j] = mask[t][j] ? b : c; } int r2 = ask(s); long long sumDelta = 0; // v = c: delta = 1 - r2 delta[c] = 1 - r2; // v = b: delta = 0 delta[b] = 0; // For other v: for (int v = 1; v <= n; ++v) { if (v == b || v == c) continue; for (int j = 0; j < n; ++j) { s[j] = mask[t][j] ? v : c; } int r1 = ask(s); delta[v] = r1 - r2; } for (int v = 1; v <= n; ++v) sumDelta += delta[v]; int Ub_num = sizeT[t] - (int)sumDelta; int Ub = Ub_num / n; // should be 0 or 1 for (int v = 1; v <= n; ++v) { int bit = delta[v] + Ub; // should be 0 or 1 if (bit < 0) bit = 0; if (bit > 1) bit = 1; pos_bits[v] |= (bit << t); } } // Build permutation from positions vector<int> perm(n + 1, 0); vector<char> used(n + 1, 0); vector<int> bad_vals; for (int v = 1; v <= n; ++v) { int idx = pos_bits[v] + 1; if (idx >= 1 && idx <= n && !used[idx]) { perm[idx] = v; used[idx] = 1; } else { bad_vals.push_back(v); } } vector<int> free_idx; for (int i = 1; i <= n; ++i) { if (!used[i]) free_idx.push_back(i); } for (size_t k = 0; k < bad_vals.size() && k < free_idx.size(); ++k) { perm[free_idx[k]] = bad_vals[k]; } vector<int> ans(n); for (int i = 1; i <= n; ++i) ans[i - 1] = perm[i]; give_answer(ans); return 0; }</code></pre>

Square Packing

Algorithmic

<p>You are given an integer <strong>n</strong>. Your task is to place <strong>n unit squares</strong> (side length 1) inside an <strong>axis-parallel square container</strong> of side length <strong>L</strong> such that:</p> <ul> <li>Every unit square lies entirely inside the container</li> <li>Unit squares have <strong>no common interior points</strong> (touching edges/corners is allowed)</li> <li>Each unit square may be <strong>rotated by an arbitrary angle</strong></li> </ul> <h3>Goal</h3> <strong>Minimize L</strong> (the container side length). <h2>Input</h2> <p>A single integer <strong>n</strong> (1 �?n �?100,000).</p> <h2>Output</h2> <strong>First line:</strong> A real number <strong>L</strong> (the claimed container side length) <strong>Next n lines:</strong> Three real numbers <code>xi yi ai</code> for i = 1..n: <ul> <li><code>(xi, yi)</code>: The center of the i-th unit square</li> <li><code>ai</code>: Its rotation angle in degrees counterclockwise (0 �?ai < 180)</li> </ul> <strong>Precision:</strong> All numbers must be finite reals. At least <strong>6 decimal places</strong> is recommended. <h3>Sample Output Format</h3> <pre><code class="language-">L x1 y1 a1 x2 y2 a2 ... xn yn an</code></pre> <h2>Validity Constraints</h2> <p>Your output is valid if and only if:</p> <p>1. <strong>Containment:</strong> Every point of every unit square lies inside [0, L] × [0, L] 2. <strong>Non-overlap (interiors):</strong> The interiors of any two unit squares are disjoint - Touching along edges or at corners is <strong>allowed</strong> 3. <strong>Angles:</strong> 0 �?ai < 180 for all i</p> <h3>Judge Verification (epsilon = 1e-7)</h3> <ul> <li><strong>Containment:</strong> A square is accepted if its maximum outward violation beyond the container is �?1e-7</li> <li><strong>Non-overlap:</strong> Two squares are accepted as interior-disjoint if the minimum signed distance between their boundaries is �?�?e-7</li> </ul> <strong>Invalid submissions score 0 for that test.</strong> <h2>Baseline</h2> <p>A simple baseline packs all squares <strong>unrotated</strong> (ai = 0) on the unit grid, with centers at (0.5 + x, 0.5 + y), using the smallest integer-sided container that fits all n.</p> The baseline side length is: <pre><code class="language-">L₀(n) = ceil(√n)</code></pre> <strong>Example:</strong> For n = 11, the baseline uses L₀ = 4.000000 <h2>Scoring</h2> <p>Let <strong>L</strong> be your reported container side length (validated by the judge). Define:</p> <ul> <li><strong>LB = √n</strong> (trivial area lower bound; no packing can have L < LB)</li> <li><strong>L₀ = ceil(√n)</strong> (baseline side length)</li> <li><strong>s = s(n)</strong> (reference scale; satisfies LB �?s �?L₀)</li> </ul> <h3>Score Calculation</h3> <table> <thead> <tr> <th>Condition</th> <th>Score</th> </tr> </thead> <tbody> <tr> <td>Invalid submission</td> <td>0 points</td> </tr> <tr> <td>L �?L₀</td> <td>1 point</td> </tr> <tr> <td>L = LB</td> <td>100 points</td> </tr> <tr> <td>LB < L �?s</td> <td>95 + 5 × min(1.0, 1.1 × p�? where p�?= (s �?L) / (s �?LB)</td> </tr> <tr> <td>s < L < L₀</td> <td>94 × min(1.0, 1.1 × p�? + 1 where p�?= (L₀ �?L) / (L₀ �?s)</td> </tr> </tbody> </table> <h3>Scoring Notes</h3> <ul> <li><strong>Baseline (L = L₀):</strong> 1 point</li> <li><strong>Meeting s(n):</strong> At least 95 points</li> <li><strong>Area bound (L = LB):</strong> 100 points</li> <li>Scores vary smoothly between these anchors</li> <li>The +10% amplification is applied within each band and capped at that band's ceiling (95 or 100)</li> </ul> <h3>Reference Scale s(n)</h3> <p>For n �?100: s(n) is the best human score For n > 100: s(n) = 2 × s(ceil(n / 4))</p> <p>This recursive definition allows scoring for large n based on smaller instances.</p> <h2>Example</h2> <h3>Input</h3> <pre><code class="language-">11</code></pre> <h3>Sample Output (Baseline)</h3> <pre><code class="language-">4.000000 0.500000 0.500000 0.000000 1.500000 0.500000 0.000000 2.500000 0.500000 0.000000 3.500000 0.500000 0.000000 0.500000 1.500000 0.000000 1.500000 1.500000 0.000000 2.500000 1.500000 0.000000 3.500000 1.500000 0.000000 0.500000 2.500000 0.000000 1.500000 2.500000 0.000000 2.500000 2.500000 0.000000</code></pre> <p>This is a valid baseline packing for n = 11 with L = 4.000000, achieving the minimum score of 1 point.</p> <h2>Additional Clarifications</h2> <h3>Unit Squares</h3> <ul> <li>Side length <strong>exactly 1</strong></li> <li>Centered at (xi, yi)</li> <li>Rotated by ai degrees around the center</li> </ul> <h3>Ordering</h3> Squares may be listed in <strong>any order</strong>. <h3>Precision</h3> <ul> <li>Inputs are read as doubles</li> <li>Small rounding differences are tolerated per epsilon (1e-7)</li> </ul> <h3>Touching</h3> <ul> <li>Squares <strong>may touch</strong> each other and the container boundary</li> <li>Only <strong>interior overlap is forbidden</strong></li> </ul> <h2>Strategy Tips</h2> <p>1. <strong>Start with baseline:</strong> Implement the simple grid-based baseline first 2. <strong>Local optimization:</strong> Use local search or simulated annealing to improve placements 3. <strong>Smart rotations:</strong> Small rotations can often reduce wasted space 4. <strong>Nonlinear optimization:</strong> Apply optimization techniques with overlap penalties 5. <strong>Known packings:</strong> For small n (�?100), research optimal or near-optimal packings 6. <strong>Recursive approach:</strong> For large n, consider dividing into smaller subproblems</p> <h2>Theoretical Background</h2> <ul> <li><strong>Area lower bound:</strong> Since each unit square has area 1, and n squares must fit in L × L, we have L² �?n, thus L �?√n</li> <li><strong>Compactness:</strong> For fixed n and L, the feasible set is closed and bounded, guaranteeing that a minimal container side length exists</li> <li><strong>Complexity:</strong> Finding the optimal packing is a hard geometric optimization problem; heuristics are essential for large n</li> </ul> <hr> <h2>Problem Metadata</h2> <strong>Time Limit:</strong> 1 second <strong>Memory Limit:</strong> 512 MB <hr> <h2>Model Information</h2> <p><strong>Model:</strong> GPT 5 Thinking</p> <p><strong>Score:</strong> 1.0</p> <h3>Solution Code</h3> <pre><code>#include <bits/stdc++.h> using namespace std; int main() { ios::sync_with_stdio(false); cin.tie(nullptr); long long n; if (!(cin >> n)) return 0; long long k = (long long)ceil(sqrt((long double)n)); double L = (double)k; cout.setf(std::ios::fixed); cout << setprecision(6); cout << L << "\n"; for (long long i = 0; i < n; ++i) { long long x = i % k; long long y = i / k; double xi = x + 0.5; double yi = y + 0.5; double ai = 0.0; cout << xi << " " << yi << " " << ai << "\n"; } return 0; }</code></pre>

Kernel Optimization for GDPA

Research

<p>Design and optimize <strong>high-performance Triton kernels</strong> for <strong>GDPA (Gated Dot-Product Attention)</strong> computation on GPU. This problem focuses on implementing efficient attention kernels with gated Q and K tensors using Triton's JIT compilation system.</p> <h3>The Challenge</h3> <p>You must optimize several critical aspects:</p> <ul> <li><strong>Gated attention computation</strong>: Efficient computation of scaled dot-product attention with gated Q and K tensors</li> <li><strong>Gating mechanism</strong>: Applying sigmoid gates to Q and K tensors before attention computation</li> <li><strong>Memory access patterns</strong>: Efficient loading and storing of Q, K, V, GQ, GK tensors</li> <li><strong>Numerical stability</strong>: Handling softmax operations with proper numerical stability using streaming softmax</li> <li><strong>Block tiling</strong>: Optimal block sizes for GPU execution across different sequence lengths</li> <li><strong>Performance benchmarking</strong>: Achieving speedup over baseline PyTorch implementations</li> </ul> <h2>Optimization Targets</h2> <p>1. <strong>Primary</strong>: Maximize geometric mean speedup over baseline (higher is better) 2. <strong>Secondary</strong>: Ensure correctness across diverse sequence lengths and attention heads 3. <strong>Tertiary</strong>: Minimize kernel launch overhead and memory usage</p> <h2>API Specification</h2> <p>Implement a <code>Solution</code> class that returns a Triton kernel implementation:</p> <pre><code class="language-python">class Solution: def solve(self, spec_path: str = None) -> dict: """ Returns a dict with either: - {"code": "python_code_string"} - {"program_path": "path/to/kernel.py"} """ # Your implementation pass</code></pre> <p>Your kernel implementation must provide a <code>gdpa_attn</code> function:</p> <pre><code class="language-python">import torch import triton import triton.language as tl <p>def gdpa_attn( Q: torch.Tensor, K: torch.Tensor, V: torch.Tensor, GQ: torch.Tensor, GK: torch.Tensor ) -> torch.Tensor: """ GDPA attention computation with gated Q and K tensors.</p> <p>Args: Q: Query tensor of shape (Z, H, M, Dq) - float16 K: Key tensor of shape (Z, H, N, Dq) - float16 V: Value tensor of shape (Z, H, N, Dv) - float16 GQ: Query gate tensor of shape (Z, H, M, Dq) - float16 GK: Key gate tensor of shape (Z, H, N, Dq) - float16</p> Returns: Output tensor of shape (Z, H, M, Dv) - float16 """ # Your implementation pass</code></pre> <h2>Input Specifications</h2> <h3>Tensor Shapes</h3> <table> <thead> <tr> <th>Tensor</th> <th>Shape</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><strong>Q</strong></td> <td><code>(Z, H, M, Dq)</code></td> <td>Query tensor</td> </tr> <tr> <td><strong>K</strong></td> <td><code>(Z, H, N, Dq)</code></td> <td>Key tensor</td> </tr> <tr> <td><strong>V</strong></td> <td><code>(Z, H, N, Dv)</code></td> <td>Value tensor</td> </tr> <tr> <td><strong>GQ</strong></td> <td><code>(Z, H, M, Dq)</code></td> <td>Query gate tensor</td> </tr> <tr> <td><strong>GK</strong></td> <td><code>(Z, H, N, Dq)</code></td> <td>Key gate tensor</td> </tr> </tbody> </table> <h3>Dimension Details</h3> <ul> <li><strong>Z</strong>: Batch size (typically 1)</li> <li><strong>H</strong>: Number of attention heads (typically 8)</li> <li><strong>M</strong>: Query sequence length (tested with 512, 1024)</li> <li><strong>N</strong>: Key/value sequence length (equals M for GDPA attention)</li> <li><strong>Dq</strong>: Query/key feature dimension (typically 64)</li> <li><strong>Dv</strong>: Value feature dimension (typically 64)</li> </ul> <h3>Data Type</h3> <ul> <li>All inputs are <strong><code>torch.float16</code></strong></li> <li>All inputs are on <strong>CUDA device</strong></li> </ul> <h2>Output Specifications</h2> <ul> <li><strong>Shape</strong>: <code>(Z, H, M, Dv)</code> - matches query batch/head dimensions</li> <li><strong>Dtype</strong>: <code>torch.float16</code></li> <li><strong>Device</strong>: Same as input (CUDA)</li> </ul> <h2>GDPA Algorithm</h2> <p>The Gated Dot-Product Attention algorithm consists of:</p> <h3>Step 1: Apply Gating</h3> <pre><code class="language-python">Qg = Q * sigmoid(GQ) Kg = K * sigmoid(GK)</code></pre> <h3>Step 2: Scaled Dot-Product Attention</h3> <pre><code class="language-python">scale = 1.0 / sqrt(Dq) scores = (Qg @ Kg.transpose(-2, -1)) * scale attn_weights = softmax(scores, dim=-1) output = attn_weights @ V</code></pre> <h2>Correctness Requirements</h2> <ul> <li><strong>Numerical correctness</strong> verified against PyTorch baseline implementation</li> <li><strong>Relative tolerance</strong>: 1e-2</li> <li><strong>Absolute tolerance</strong>: 5e-3</li> <li><strong>All test cases must pass</strong> for any score above 0</li> <li>Gating must be correctly applied before attention computation</li> </ul> <h2>Scoring (0-100)</h2> <p>Performance is measured against <strong>GPU baseline implementations</strong>:</p> <pre><code class="language-python">geometric_mean_gpu_time = geometric_mean(gpu_baseline_times) geometric_mean_answer_time = geometric_mean(answer_times) <h1>Linear interpolation</h1> target_time_0 = geometric_mean_gpu_time # 0 points (1x GPU baseline) target_time_100 = geometric_mean_gpu_time / 3.0 # 100 points (3x speedup) score = 100 * (target_time_0 - geometric_mean_answer_time) / (target_time_0 - target_time_100)</code></pre> <h3>Score Interpretation</h3> <table> <thead> <tr> <th>Performance</th> <th>Score</th> <th>Speedup vs GPU Baseline</th> </tr> </thead> <tbody> <tr> <td>Baseline performance</td> <td>0 points</td> <td>1.0x</td> </tr> <tr> <td>2x speedup</td> <td>50 points</td> <td>2.0x</td> </tr> <tr> <td>3x speedup</td> <td>100 points</td> <td>3.0x</td> </tr> <tr> <td>4x speedup (exceptional)</td> <td>>100 points</td> <td>>3.0x</td> </tr> </tbody> </table> <strong>Note:</strong> Score is linearly interpolated between 1x (0 points) and 3x (100 points) GPU baseline. <h2>Evaluation Details</h2> <h3>Test Configuration</h3> <ul> <li><strong>Test cases</strong>: M = 512, 1024 (with N = M)</li> <li><strong>Warmup phase</strong>: 10 iterations to stabilize GPU clocks and caches</li> <li><strong>Random seed</strong>: Fixed seed (0) for reproducible data generation</li> <li><strong>Strict correctness</strong>: Any test failure results in score of 0</li> </ul> <h3>Timing Methodology</h3> <p>1. Generate input tensors on GPU 2. Warm up with 10 iterations 3. Measure time for actual computation 4. Verify correctness against reference 5. Compute geometric mean across test cases</p> <h2>Implementation Tips</h2> <h3>Performance Optimization</h3> <p>1. <strong>Use Triton's block pointers</strong> (<code>tl.make_block_ptr</code>) for efficient memory access 2. <strong>Implement streaming softmax</strong> for numerical stability and memory efficiency 3. <strong>Tune block sizes</strong> (e.g., BLOCK_M, BLOCK_N, BLOCK_K) for different sequence lengths 4. <strong>Minimize memory transactions</strong> through proper tiling strategies 5. <strong>Leverage warp-level primitives</strong> for maximum parallelism</p> <h3>Numerical Stability</h3> <ul> <li>Use <strong>online softmax</strong> algorithm to avoid overflow</li> <li>Compute max and sum in a streaming fashion</li> <li>Scale appropriately: <code>scale = 1.0 / sqrt(Dq)</code></li> </ul> <h3>Memory Considerations</h3> <ul> <li><strong>Register usage</strong>: Balance between parallelism and register spilling</li> <li><strong>Shared memory</strong>: Use efficiently for tiling</li> <li><strong>Global memory coalescing</strong>: Ensure contiguous access patterns</li> </ul> <h2>Example Kernel Structure</h2> <pre><code class="language-python">@triton.jit def gdpa_attention_kernel( Q, K, V, GQ, GK, Out, stride_qz, stride_qh, stride_qm, stride_qd, # ... other strides M, N, Dq, Dv, H, BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_Dq: tl.constexpr, BLOCK_Dv: tl.constexpr, ): # Compute block indices # Load Q, K, V, GQ, GK blocks # Apply gating: Qg = Q * sigmoid(GQ), Kg = K * sigmoid(GK) # Compute attention: scores = Qg @ Kg.T * scale # Apply softmax and matmul with V # Store output pass</code></pre> <h2>Resources</h2> <ul> <li><strong>Triton Documentation</strong>: [https://triton-lang.org](https://triton-lang.org)</li> <li><strong>Flash Attention Paper</strong>: Understanding efficient attention mechanisms</li> <li><strong>GPU Performance Primer</strong>: CUDA programming best practices</li> </ul> <h2>Common Pitfalls</h2> <p>1. �?Not using streaming softmax �?numerical instability 2. �?Poor block size choices �?low occupancy 3. �?Inefficient memory access patterns �?memory bottleneck 4. �?Ignoring warmup �?inconsistent timing results 5. �?Incorrect gating implementation �?wrong results</p> <h2>Success Criteria</h2> <p>�?All correctness tests pass �?Achieve at least 1.5x speedup over baseline (>50 points) �?Kernel works for both M=512 and M=1024 �?Numerical stability across all test cases</p> <hr> <h2>Problem Metadata</h2> <strong>Domain:</strong> High-Performance Computing (HPC) / GPU Programming <strong>Framework:</strong> Triton JIT Compilation <strong>Hardware:</strong> NVIDIA GPU with CUDA support <hr> <h2>Model Information</h2> <p><strong>Model:</strong> Grok 4</p> <p><strong>Score:</strong> 95.3</p> <h3>Solution Code</h3> <pre><code>import torch import triton import triton.language as tl @triton.jit def gdpa_attn_kernel( Q_ptr, K_ptr, V_ptr, GQ_ptr, GK_ptr, Out_ptr, Z, H, M, N, Dq, Dv, stride_Q_z, stride_Q_h, stride_Q_m, stride_Q_d, stride_K_z, stride_K_h, stride_K_n, stride_K_d, stride_V_z, stride_V_h, stride_V_n, stride_V_d, stride_GQ_z, stride_GQ_h, stride_GQ_m, stride_GQ_d, stride_GK_z, stride_GK_h, stride_GK_n, stride_GK_d, stride_Out_z, stride_Out_h, stride_Out_m, stride_Out_d, BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_D: tl.constexpr, ): pid_z = tl.program_id(0) pid_h = tl.program_id(1) pid_m = tl.program_id(2) z = pid_z h = pid_h m_start = pid_m * BLOCK_M m_offsets = m_start + tl.arange(0, BLOCK_M) m_mask = m_offsets < M # Load Q and GQ d_offsets = tl.arange(0, BLOCK_D) Q_ptrs = Q_ptr + z * stride_Q_z + h * stride_Q_h + m_offsets[:, None] * stride_Q_m + d_offsets[None, :] * stride_Q_d Q_mask = m_mask[:, None] & (d_offsets[None, :] < Dq) Q = tl.load(Q_ptrs, mask=Q_mask, other=0.0) GQ_ptrs = GQ_ptr + z * stride_GQ_z + h * stride_GQ_h + m_offsets[:, None] * stride_GQ_m + d_offsets[None, :] * stride_GQ_d GQ = tl.load(GQ_ptrs, mask=Q_mask, other=0.0) GQ_sig = 1.0 / (1.0 + tl.exp(-GQ.to(tl.float32))) Qg = Q * GQ_sig.to(Q.dtype) # Initialize accumulators l = tl.full((BLOCK_M,), -float('inf'), dtype=tl.float32) acc = tl.zeros((BLOCK_M, BLOCK_D), dtype=tl.float32) denom = tl.zeros((BLOCK_M,), dtype=tl.float32) denom = tl.zeros((BLOCK_M,), dtype=tl.float32) scale = tl.rsqrt(tl.full((), Dq, dtype=tl.float32)) for n_start in tl.range(0, N, BLOCK_N): n_offsets = n_start + tl.arange(0, BLOCK_N) n_mask = n_offsets < N # Load K and GK K_ptrs = K_ptr + z * stride_K_z + h * stride_K_h + n_offsets[:, None] * stride_K_n + d_offsets[None, :] * stride_K_d K_mask = n_mask[:, None] & (d_offsets[None, :] < Dq) K = tl.load(K_ptrs, mask=K_mask, other=0.0) GK_ptrs = GK_ptr + z * stride_GK_z + h * stride_GK_h + n_offsets[:, None] * stride_GK_n + d_offsets[None, :] * stride_GK_d GK = tl.load(GK_ptrs, mask=K_mask, other=0.0) GK_sig = 1.0 / (1.0 + tl.exp(-GK.to(tl.float32))) Kg = K * GK_sig.to(K.dtype) # Compute attention scores attn = tl.dot(Qg, tl.trans(Kg), out_dtype=tl.float32) * scale # Load V V_ptrs = V_ptr + z * stride_V_z + h * stride_V_h + n_offsets[:, None] * stride_V_n + d_offsets[None, :] * stride_V_d V_mask = n_mask[:, None] & (d_offsets[None, :] < Dv) V = tl.load(V_ptrs, mask=V_mask, other=0.0) V_fp32 = V.to(tl.float32) # Streaming softmax update attn_max = tl.max(attn, axis=1) new_m = tl.maximum(l, attn_max) exp_attn = tl.exp(attn - new_m[:, None]) alpha = tl.exp(l - new_m) acc = acc * alpha[:, None] + tl.dot(exp_attn, V_fp32, out_dtype=tl.float32) denom = denom * alpha + tl.sum(exp_attn, axis=1) l = new_m # Store output denom = tl.maximum(denom, 1e-9) acc = acc / denom[:, None] Out_ptrs = Out_ptr + z * stride_Out_z + h * stride_Out_h + m_offsets[:, None] * stride_Out_m + d_offsets[None, :] * stride_Out_d Out_mask = m_mask[:, None] & (d_offsets[None, :] < Dv) tl.store(Out_ptrs, acc.to(tl.float16), mask=Out_mask) def gdpa_attn(Q: torch.Tensor, K: torch.Tensor, V: torch.Tensor, GQ: torch.Tensor, GK: torch.Tensor) -> torch.Tensor: Z, H, M, Dq = Q.shape N = K.shape[2] Dv = V.shape[3] assert K.shape == (Z, H, N, Dq) assert V.shape == (Z, H, N, Dv) assert GQ.shape == (Z, H, M, Dq) assert GK.shape == (Z, H, N, Dq) Out = torch.empty_like(V[:, :, :M, :]) # (Z, H, M, Dv) # Strides stride_Q = Q.stride() stride_K = K.stride() stride_V = V.stride() stride_GQ = GQ.stride() stride_GK = GK.stride() stride_Out = Out.stride() BLOCK_M = 32 BLOCK_N = 32 BLOCK_D = 64 # Assuming Dq == Dv == 64 grid = (Z, H, triton.cdiv(M, BLOCK_M)) gdpa_attn_kernel[grid]( Q, K, V, GQ, GK, Out, Z, H, M, N, Dq, Dv, *stride_Q, *stride_K, *stride_V, *stride_GQ, *stride_GK, *stride_Out, BLOCK_M, BLOCK_N, BLOCK_D, ) return Out class Solution: def solve(self, spec_path: str = None) -> dict: code = ''' import torch import triton import triton.language as tl @triton.jit def gdpa_attn_kernel( Q_ptr, K_ptr, V_ptr, GQ_ptr, GK_ptr, Out_ptr, Z, H, M, N, Dq, Dv, stride_Q_z, stride_Q_h, stride_Q_m, stride_Q_d, stride_K_z, stride_K_h, stride_K_n, stride_K_d, stride_V_z, stride_V_h, stride_V_n, stride_V_d, stride_GQ_z, stride_GQ_h, stride_GQ_m, stride_GQ_d, stride_GK_z, stride_GK_h, stride_GK_n, stride_GK_d, stride_Out_z, stride_Out_h, stride_Out_m, stride_Out_d, BLOCK_M: tl.constexpr, BLOCK_N: tl.constexpr, BLOCK_D: tl.constexpr, ): pid_z = tl.program_id(0) pid_h = tl.program_id(1) pid_m = tl.program_id(2) z = pid_z h = pid_h m_start = pid_m * BLOCK_M m_offsets = m_start + tl.arange(0, BLOCK_M) m_mask = m_offsets < M # Load Q and GQ d_offsets = tl.arange(0, BLOCK_D) Q_ptrs = Q_ptr + z * stride_Q_z + h * stride_Q_h + m_offsets[:, None] * stride_Q_m + d_offsets[None, :] * stride_Q_d Q_mask = m_mask[:, None] & (d_offsets[None, :] < Dq) Q = tl.load(Q_ptrs, mask=Q_mask, other=0.0) GQ_ptrs = GQ_ptr + z * stride_GQ_z + h * stride_GQ_h + m_offsets[:, None] * stride_GQ_m + d_offsets[None, :] * stride_GQ_d GQ = tl.load(GQ_ptrs, mask=Q_mask, other=0.0) GQ_sig = 1.0 / (1.0 + tl.exp(-GQ.to(tl.float32))) Qg = Q * GQ_sig.to(Q.dtype) # Initialize accumulators l = tl.full((BLOCK_M,), -float('inf'), dtype=tl.float32) acc = tl.zeros((BLOCK_M, BLOCK_D), dtype=tl.float32) denom = tl.zeros((BLOCK_M,), dtype=tl.float32) scale = tl.rsqrt(tl.full((), Dq, dtype=tl.float32)) for n_start in tl.range(0, N, BLOCK_N): n_offsets = n_start + tl.arange(0, BLOCK_N) n_mask = n_offsets < N # Load K and GK K_ptrs = K_ptr + z * stride_K_z + h * stride_K_h + n_offsets[:, None] * stride_K_n + d_offsets[None, :] * stride_K_d K_mask = n_mask[:, None] & (d_offsets[None, :] < Dq) K = tl.load(K_ptrs, mask=K_mask, other=0.0) GK_ptrs = GK_ptr + z * stride_GK_z + h * stride_GK_h + n_offsets[:, None] * stride_GK_n + d_offsets[None, :] * stride_GK_d GK = tl.load(GK_ptrs, mask=K_mask, other=0.0) GK_sig = 1.0 / (1.0 + tl.exp(-GK.to(tl.float32))) Kg = K * GK_sig.to(K.dtype) # Compute attention scores attn = tl.dot(Qg, tl.trans(Kg), out_dtype=tl.float32) * scale # Load V V_ptrs = V_ptr + z * stride_V_z + h * stride_V_h + n_offsets[:, None] * stride_V_n + d_offsets[None, :] * stride_V_d V_mask = n_mask[:, None] & (d_offsets[None, :] < Dv) V = tl.load(V_ptrs, mask=V_mask, other=0.0) V_fp32 = V.to(tl.float32) # Streaming softmax update attn_max = tl.max(attn, axis=1) new_m = tl.maximum(l, attn_max) exp_attn = tl.exp(attn - new_m[:, None]) alpha = tl.exp(l - new_m) acc = acc * alpha[:, None] + tl.dot(exp_attn, V_fp32, out_dtype=tl.float32) denom = denom * alpha + tl.sum(exp_attn, axis=1) l = new_m # Store output denom = tl.maximum(denom, 1e-9) acc = acc / denom[:, None] Out_ptrs = Out_ptr + z * stride_Out_z + h * stride_Out_h + m_offsets[:, None] * stride_Out_m + d_offsets[None, :] * stride_Out_d Out_mask = m_mask[:, None] & (d_offsets[None, :] < Dv) tl.store(Out_ptrs, acc.to(tl.float16), mask=Out_mask) def gdpa_attn(Q: torch.Tensor, K: torch.Tensor, V: torch.Tensor, GQ: torch.Tensor, GK: torch.Tensor) -> torch.Tensor: Z, H, M, Dq = Q.shape N = K.shape[2] Dv = V.shape[3] assert K.shape == (Z, H, N, Dq) assert V.shape == (Z, H, N, Dv) assert GQ.shape == (Z, H, M, Dq) assert GK.shape == (Z, H, N, Dq) Out = torch.empty_like(V[:, :, :M, :]) # (Z, H, M, Dv) # Strides stride_Q = Q.stride() stride_K = K.stride() stride_V = V.stride() stride_GQ = GQ.stride() stride_GK = GK.stride() stride_Out = Out.stride() BLOCK_M = 32 BLOCK_N = 32 BLOCK_D = 64 # Assuming Dq == Dv == 64 grid = (Z, H, triton.cdiv(M, BLOCK_M)) gdpa_attn_kernel[grid]( Q, K, V, GQ, GK, Out, Z, H, M, N, Dq, Dv, *stride_Q, *stride_K, *stride_V, *stride_GQ, *stride_GK, *stride_Out, BLOCK_M, BLOCK_N, BLOCK_D, ) return Out ''' return {"code": code} </code></pre>

Vector Database Design

Research

<p>Design a <strong>Vector Database index</strong> that balances <strong>recall accuracy</strong> and <strong>query latency</strong> equally. This tier represents the most realistic production scenario where both metrics matter.</p> <strong>Objective:</strong> Build an index that optimizes both recall and latency within a constrained region. <h2>Optimization Targets</h2> <p>1. <strong>Primary</strong>: Maximize recall@1 (higher is better) 2. <strong>Secondary</strong>: Minimize query latency (lower is better)</p> <p>This is a <strong>multi-objective optimization problem</strong> where you must find the best tradeoff on the Pareto frontier.</p> <h2>Pareto Frontier Definition - BALANCED TIER</h2> <p>This tier enforces <strong>balanced constraints</strong> on both recall and latency.</p> <h3>Pareto Frontier Boundaries</h3> <strong>Recall Range:</strong> 0.6939 to 1.0 (70% to 100% of baseline, clamped) <ul> <li><strong>Minimum</strong>: �?0.6939 (70% of baseline 0.9914)</li> <li><strong>Maximum</strong>: �?1.0 (cannot exceed 100%)</li> </ul> <strong>Latency Range:</strong> 2.695ms to 5.775ms (70% to 150% of baseline) <ul> <li><strong>Minimum</strong>: �?2.695ms (70% of baseline 3.85ms)</li> <li><strong>Maximum</strong>: �?5.775ms (150% of baseline 3.85ms)</li> </ul> <h3>Frontier Regions</h3> <table> <thead> <tr> <th>Region</th> <th>Criteria</th> <th>Status</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><strong>Dominates Reference</strong></td> <td>Recall �?0.9914 <strong>AND</strong> Latency �?3.85ms</td> <td>�?Best</td> <td>Better in both dimensions</td> </tr> <tr> <td><strong>On Frontier</strong></td> <td>(Recall �?0.6939 AND Latency �?5.775ms) AND (Either metric improved)</td> <td>�?Good</td> <td>Acceptable tradeoff within boundaries</td> </tr> <tr> <td><strong>Tradeoff</strong></td> <td>One dimension improved, one worse</td> <td>�?Acceptable</td> <td>Still within boundary region</td> </tr> <tr> <td><strong>Dominated</strong></td> <td>Recall < 0.6939 <strong>OR</strong> Latency > 5.775ms</td> <td>�?Avoid</td> <td>Outside acceptable region</td> </tr> </tbody> </table> <h2>Baseline Metrics</h2> <ul> <li><strong>Reference recall@1</strong>: 0.9914 (99.14% accuracy)</li> <li><strong>Reference avg query time</strong>: 3.85ms</li> <li><strong>Reference score</strong>: 60.0 (baseline achievement level)</li> </ul> <h2>Scoring (0-100)</h2> <p>The scoring system uses <strong>latency-gated recall</strong>:</p> <pre><code class="language-python"><h1>Step 1: Enforce latency gate</h1> if avg_query_time_ms > latency_threshold_ms: score = 0 else: # Step 2: Score by recall only if recall_at_1 > baseline_recall: score = 100 else: # Linearly scale recall within [lower_threshold, baseline] recall_range = baseline_recall - lower_threshold recall_proportion = clamp((recall_at_1 - lower_threshold) / recall_range, 0, 1) score = 0 + (100 - 0) * recall_proportion</code></pre> <h3>Scoring Parameters</h3> <ul> <li><strong>baseline_recall</strong> = 0.9914</li> <li><strong>latency_threshold_ms</strong> = 3.85 (tier-specific)</li> <li><strong>lower_threshold</strong> = tier-specific (0.6939�?.9417 range)</li> </ul> <h3>Example Scores for Balanced Tier</h3> <table> <thead> <tr> <th>Recall</th> <th>Latency</th> <th>Status</th> <th>Score</th> <th>Reasoning</th> </tr> </thead> <tbody> <tr> <td>0.9914</td> <td>3.85ms</td> <td>Baseline</td> <td>60</td> <td>Matches reference</td> </tr> <tr> <td>0.9900</td> <td>2.70ms</td> <td>On frontier (faster)</td> <td>~70</td> <td>Lower latency</td> </tr> <tr> <td>0.9850</td> <td>3.00ms</td> <td>On frontier (balanced)</td> <td>~68</td> <td>Good balance</td> </tr> <tr> <td>0.9950</td> <td>4.50ms</td> <td>On frontier (higher recall)</td> <td>~70</td> <td>Better accuracy</td> </tr> <tr> <td>0.8500</td> <td>3.85ms</td> <td>Tradeoff (low recall)</td> <td>~48</td> <td>Recall too low</td> </tr> <tr> <td>0.9800</td> <td>5.77ms</td> <td>At boundary</td> <td>~63</td> <td>Near latency limit</td> </tr> </tbody> </table> <h2>API Specification</h2> <p>Implement a class with the following interface:</p> <pre><code class="language-python">import numpy as np from typing import Tuple <p>class YourIndex: def __init__(self, dim: int, **kwargs): """ Initialize index for vectors of dimension dim.</p> <p>Args: dim: Dimension of vectors (128 for SIFT1M) **kwargs: Optional configuration parameters """ pass</p> <p>def add(self, xb: np.ndarray) -> None: """ Add vectors to the index.</p> <p>Args: xb: Base vectors of shape (N, dim) where N = 1,000,000 """ pass</p> <p>def search(self, xq: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]: """ Search for k nearest neighbors.</p> <p>Args: xq: Query vectors of shape (nq, dim) where nq = 10,000 k: Number of nearest neighbors to return (k=1 for evaluation)</p> Returns: distances: Array of shape (nq, k) with L2 distances indices: Array of shape (nq, k) with indices of neighbors """ pass</code></pre> <h2>Evaluation Details</h2> <h3>Dataset: SIFT1M</h3> <ul> <li><strong>Base vectors</strong>: 1,000,000 vectors of 128 dimensions</li> <li><strong>Query vectors</strong>: 10,000 vectors of 128 dimensions</li> <li><strong>Ground truth</strong>: Pre-computed nearest neighbors for recall calculation</li> <li><strong>Distance metric</strong>: L2 (Euclidean distance)</li> <li><strong>Recall measured at</strong>: k=1</li> </ul> <h3>Timeout</h3> <ul> <li>Maximum execution time: <strong>1 hour</strong></li> </ul> <h2>How Evaluation Works</h2> <h3>1. Index Construction</h3> <pre><code class="language-python">index = YourIndex(dim=128) index.add(xb) # xb: (1,000,000, 128) base vectors</code></pre> <h3>2. Batch Query Performance (Used for Scoring)</h3> <pre><code class="language-python"><h1>Measures time for all 10K queries at once</h1> t0 = time.time() D, I = index.search(xq, k=1) # xq: (10,000, 128) query vectors t1 = time.time() <h1>Calculate metrics</h1> recall_at_1 = (I[:, :1] == gt[:, :1]).sum() / float(nq) avg_query_time_ms = (t1 - t0) * 1000.0 / float(nq)</code></pre> <strong>Important:</strong> <ul> <li><code>avg_query_time_ms</code> is what's used for <strong>scoring</strong></li> <li>Batch queries benefit from CPU cache locality and vectorization</li> <li>This is the <strong>average time per query</strong> in a batch</li> </ul> <h3>3. Single Query Sample (For Reference Only)</h3> <pre><code class="language-python"><h1>Samples first 100 queries individually</h1> for i in range(min(100, nq)): q = xq[i:i+1] single_time = measure_time(index.search(q, k=1))</code></pre> <strong>Note:</strong> This metric is <strong>NOT used for scoring</strong>, only for reference. Single-query latency can differ from batch average. <h3>4. Scoring Calculation</h3> <ul> <li>The evaluator loads <code>score_config.json</code> for baseline and thresholds</li> <li>Uses <code>avg_query_time_ms</code> from batch queries and <code>recall_at_1</code></li> <li>Applies tier-specific scoring mode: <code>latency_gated_recall</code></li> </ul> <h2>Code Structure Requirements</h2> �?<strong>Required methods</strong>: <code>__init__(dim, **kwargs)</code>, <code>add(xb)</code>, and <code>search(xq, k)</code> �?<strong>Auto-discovery</strong>: The evaluator automatically finds your class with <code>add</code> and <code>search</code> methods �?<strong>No dataset loading</strong>: The evaluator provides <code>xb</code>, <code>xq</code>, <code>gt</code> arrays �?<strong>Return format</strong>: <code>search</code> must return <code>(distances, indices)</code> tuple where indices shape is <code>(nq, k)</code> <h2>Strategy Tips for BALANCED TIER</h2> <h3>1. Balance Speed and Accuracy</h3> Both metrics matter equally. Don't optimize one at the expense of the other. <h3>2. Choose Indexing Strategy Thoughtfully</h3> Common approaches: <ul> <li><strong>IVF (Inverted File Index)</strong>: Good balance, tunable nprobe parameter</li> <li><strong>HNSW (Hierarchical NSW)</strong>: Excellent recall, adjustable ef_search</li> <li><strong>LSH (Locality-Sensitive Hashing)</strong>: Fast but lower recall</li> <li><strong>Product Quantization</strong>: Memory-efficient, configurable accuracy</li> </ul> <h3>3. Tune Parameters Carefully</h3> Small changes can significantly affect both metrics: <ul> <li>Number of clusters/partitions</li> <li>Search breadth parameters</li> <li>Quantization levels</li> <li>Distance computation optimizations</li> </ul> <h3>4. Stay Within Boundaries</h3> Going outside the acceptable region hurts your score significantly. <h3>5. Target 50-60 Points</h3> <ul> <li>Realistic human solution range</li> <li>Achievable without extreme optimization</li> <li>Shows good balance without perfect tuning</li> </ul> <h2>Key Differences from Other Tiers</h2> <table> <thead> <tr> <th>Tier</th> <th>Focus</th> <th>Latency Allowance</th> <th>Recall Allowance</th> </tr> </thead> <tbody> <tr> <td><strong>High Recall</strong></td> <td>Accuracy above all</td> <td>2x (allows slower)</td> <td>Strict</td> </tr> <tr> <td><strong>Balanced (This)</strong></td> <td>Equal balance</td> <td>1.5x (moderate)</td> <td>0.7-1.0x</td> </tr> <tr> <td><strong>Low Latency</strong></td> <td>Speed above all</td> <td>Strict</td> <td>Relaxed</td> </tr> </tbody> </table> <h2>Recommended Approach</h2> <h3>Phase 1: Baseline</h3> 1. Start with a simple, well-understood index (e.g., FAISS IVF) 2. Implement the required API 3. Verify correctness with small dataset <h3>Phase 2: Tuning</h3> 1. Tune index parameters to match the latency target 2. Measure recall at different parameter settings 3. Find sweet spot on the Pareto frontier <h3>Phase 3: Optimization</h3> 1. Improve recall through better index configuration 2. Optimize search implementation for lower latency 3. Continuously monitor both metrics <h3>Phase 4: Validation</h3> 1. Test on full SIFT1M dataset 2. Verify both metrics are within boundaries 3. Ensure consistent performance across queries <h2>Common Indexing Methods</h2> <h3>IVF (Inverted File Index)</h3> <pre><code class="language-python"><h1>Partition space into clusters</h1> <h1>Search only nearby clusters</h1> index = faiss.IndexIVFFlat(quantizer, dim, nlist) index.train(xb) index.add(xb) index.nprobe = 10 # Number of clusters to search</code></pre> <h3>HNSW (Hierarchical Navigable Small World)</h3> <pre><code class="language-python"><h1>Graph-based index with hierarchical structure</h1> index = hnswlib.Index(space='l2', dim=dim) index.init_index(max_elements=N, ef_construction=200, M=16) index.add_items(xb) index.set_ef(50) # Search parameter</code></pre> <h3>Product Quantization</h3> <pre><code class="language-python"><h1>Compress vectors using product quantization</h1> index = faiss.IndexIVFPQ(quantizer, dim, nlist, m, nbits) index.train(xb) index.add(xb)</code></pre> <h2>Performance Tips</h2> <p>1. <strong>Vectorization</strong>: Use NumPy/FAISS vectorized operations 2. <strong>Memory layout</strong>: Ensure contiguous memory for cache efficiency 3. <strong>Batch processing</strong>: Leverage batch query optimizations 4. <strong>Pre-computation</strong>: Cache frequently used values 5. <strong>Distance shortcuts</strong>: Use early termination when possible</p> <h2>Resources</h2> <ul> <li><strong>FAISS Documentation</strong>: [https://github.com/facebookresearch/faiss](https://github.com/facebookresearch/faiss)</li> <li><strong>HNSW Paper</strong>: "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs"</li> <li><strong>SIFT1M Dataset</strong>: [http://corpus-texmex.irisa.fr/](http://corpus-texmex.irisa.fr/)</li> </ul> <h2>Success Criteria</h2> <p>�?Recall@1 within 0.6939 to 1.0 range �?Average query latency within 2.695ms to 5.775ms range �?Score > 50 points (better than simple baseline) �?Correct API implementation with proper return formats �?Passes all correctness tests</p> <hr> <h2>Problem Metadata</h2> <strong>Domain:</strong> Database Systems / Information Retrieval <strong>Problem Type:</strong> Approximate Nearest Neighbor Search <strong>Dataset:</strong> SIFT1M (1 million vectors, 128 dimensions) <hr> <h2>Model Information</h2> <p><strong>Model:</strong> GPT 5 Thinking</p> <p><strong>Score:</strong> 97.7</p> <h3>Solution Code</h3> <pre><code>import numpy as np from typing import Tuple class Solution: def solve(self, spec_path: str = None) -> dict: code = r''' import numpy as np import faiss from typing import Tuple class YourIndex: def __init__(self, dim: int, **kwargs): """Initialize index for vectors of dimension dim""" self.dim = dim self.nlist = kwargs.get('nlist', 100) # Number of clusters self.nprobe = kwargs.get('nprobe', 10) # Number of clusters to search self.quantizer = faiss.IndexFlatL2(dim) self.index = faiss.IndexIVFFlat(self.quantizer, dim, self.nlist, faiss.METRIC_L2) self.index.nprobe = self.nprobe def add(self, xb: np.ndarray) -> None: """Add vectors to the index. xb: shape (N, dim)""" if not self.index.is_trained: self.index.train(xb) self.index.add(xb) def search(self, xq: np.ndarray, k: int) -> Tuple[np.ndarray, np.ndarray]: """Search for k nearest neighbors. Returns (distances, indices)""" distances, indices = self.index.search(xq, k) return distances, indices ''' return {"code": code} solution = Solution() code = solution.solve()["code"] exec(code) </code></pre>

World Map

Algorithmic

Mr. Pacha, a Bolivian archeologist, discovered an ancient document near Tiwanaku that describes the world during the Tiwanaku Period (300–1000 CE). At that time, there were N countries, numbered from 1 to N. The document contains a list of **M** different pairs of adjacent countries: - (A\[0\], B\[0\]), (A\[1\], B\[1\]), ..., (A\[M-1\], B\[M-1\]) For each i (0 ≤ i < M), the document states that country A\[i\] was adjacent to country B\[i\] and vice versa. Pairs of countries not listed were **not adjacent**. ## Task Mr. Pacha wants to create a map of the world such that all adjacencies between countries are exactly as they were during the Tiwanaku Period. ### Map Creation Process 1. Choose a positive integer **K** 2. Draw the map as a grid of **K × K** square cells - Rows numbered from 0 to K−1 (top to bottom) - Columns numbered from 0 to K−1 (left to right) 3. Color each cell using one of N colors (colors numbered 1 to N) - Country j (1 ≤ j ≤ N) is represented by color j ### Coloring Constraints The coloring must satisfy **all** of the following conditions: 1. **Complete Coverage**: For each j (1 ≤ j ≤ N), there is at least one cell with color j 2. **Adjacency Preservation**: For each pair of adjacent countries (A\[i\], B\[i\]), there must be at least one pair of adjacent cells (sharing a side) colored A\[i\] and B\[i\]. 3. **No False Adjacencies**: Any two adjacent cells with different colors must correspond to countries that were adjacent during the Tiwanaku Period. ### Example If **N = 3**, **M = 2**, and the adjacent country pairs are **(1,2)** and **(2,3)**, then the pair **(1,3)** was not adjacent. A valid map with **K = 3**: ``` 2 3 3 2 3 2 1 2 1 ```` **Note:** A country does not need to form a connected region. --- ## Your Task Help Mr. Pacha choose a value of K and create a map. The document guarantees that such a map exists. Since Mr. Pacha prefers smaller maps, **your score depends on K**, but finding the minimum possible K is *not required*. --- ## Implementation Details You should implement: ```cpp std::vector<std::vector<int>> create_map( int N, int M, std::vector<int> A, std::vector<int> B ); ```` ### Parameters * **N**: number of countries * **M**: number of adjacency pairs * **A**, **B**: arrays of length M describing the adjacencies ### Return Value Return a matrix **C**: * Let K be the length of C * Each C[i] must be an array of length K * Values must lie between 1 and N * C[i][j] is the color * **K ≤ 240** --- ## Constraints * 1 ≤ N ≤ 40 * 0 ≤ M ≤ N·(N−1)/2 * 1 ≤ A[i] < B[i] ≤ N * All (A[i], B[i]) are distinct * A valid map exists --- ## Scoring You want **R = K / N** to be as small as possible. --- ## Examples ### Example 1 Call: ```cpp create_map(3, 2, [1, 2], [2, 3]); ``` Possible output: ```cpp [ [2, 3, 3], [2, 3, 2], [1, 2, 1] ] ``` ### Example 2 Call: ```cpp create_map(4, 4, [1, 1, 2, 3], [2, 3, 4, 4]); ``` Adjacency: * Adjacent: (1,2), (1,3), (2,4), (3,4) * Not adjacent: (1,4), (2,3) Valid **K = 7** map: ```cpp [ [2, 1, 3, 3, 4, 3, 4], [2, 1, 3, 3, 3, 3, 3], [2, 1, 1, 1, 3, 4, 4], [2, 2, 2, 1, 3, 4, 3], [1, 1, 1, 2, 4, 4, 4], [2, 2, 1, 2, 2, 4, 3], [2, 2, 1, 2, 2, 4, 4] ] ``` A smaller **K = 2** map also works: ```cpp [ [3, 1], [4, 2] ] ``` --- ## Sample Grader ### Input Format ``` T N M A[0] B[0] ... A[M-1] B[M-1] ``` ### Output Format ``` P Q[0] Q[1] ... Q[P-1] C[0][0] ... C[0][Q[0]-1] ... C[P-1][0] ... C[P-1][Q[P-1]-1] ``` P is the number of rows, and Q[i] is the length of row i. --- ## Problem Metadata **Competition:** IOI 2024 — Day 1 **Language:** English (ISC) --- ## Model Information **Model:** GPT 5 Thinking **Score:** 46.0 ### Solution Code ```cpp #include <bits/stdc++.h> using namespace std; // (code unchanged) vector<vector<int>> create_map(int N, int M, vector<int> A, vector<int> B) { ... } int main() { ... } ```

FrontierCS: The Next Frontier of Computer Science

Treasure Packing

Permutation Guess

Square Packing

Kernel Optimization for GDPA

Vector Database Design

World Map