MongoDB 索引的 ESR 規則詳解

建議搭配第一篇 MongoDB 不懂 ESR 別說你會用 Index !! 一起閱讀

我們建立 MongoDB 索引時需要掌握 ESR 原則，所謂的 ESR 代表的是 E(quality)、S(ort)、R(Range)，也就是説，索引欄位順序必須是這個順序，才能達到最佳效能。

這是一個大原則，相同類型的欄位可以重複，例如：

EESR (O)
ESRR (O)
ER (O)
EER (O)
ESER (X) 像這種違反大原則的就不符合規則

我們遵循 ESR 規則的原因是使得每次查詢都能夠以最少的 KEY 掃描次數就能找到文件為目標，這樣速度會是最快的。大家可能受到 RDB 觀念影響，就很容易發生未遵照 ESR 原則的情況，這樣會使得查詢未達最佳化。

如果沒按照 ESR 規則，MongoDB 的查詢速度肯定會受到影響，其次是第一階段會使用 RAM 進行索引排序，也就是後面會提到的 sort、sort_key_generator; 第二階段，一但超過記憶體使用上限，就必須使用硬碟排序，此時必需啟用 allowDisk 設定，速度會再更慢。附帶一提，MongoDB find 查詢的參數順序是沒影響的，例如 .find({ fieldA: true, fieldB: 3 }) 等同於 .find({ fieldB: 3, fieldA: true })

這邊要強調，實務上並非所有情境都需要即時得到查詢結果，例如週期性產生報表的排程作業就沒有這麼高的即時性，刻意建立相關的索引會換來寫入速度降低。

使用測試資料來演示 ESR

這邊我們先建立了五筆資料與一把索引，特別注意索引刻意先把 Range 建立在 Equality 前面。

db.myctw.insertMany([
  { eqField: true , rangeField: 1, sortField: 1 },
  { eqField: false , rangeField: 2, sortField: 2 },
  { eqField: true , rangeField: 3, sortField: 3 },
  { eqField: true , rangeField: 4, sortField: 4 },
  { eqField: true , rangeField: 5, sortField: 5 },
]);

db.myctw.createIndex({ rangeField:1, eqField:1 }, { name: "esrIdx"});

(以下 explain 內容僅保留本篇文章關注的欄位)

Step1 查詢 Range

{ rangeField: { $gte:2, $lt:5 } }

db.myctw.find({ rangeField: { $gte:2, $lt:5 } }).explain('executionStats').executionStats;
/* Output */
{
  executionSuccess: true,
  nReturned: 3,
  totalKeysExamined: 3,
  totalDocsExamined: 3,
  // ...
}

主要觀察的參數有三個：

nReturned: number of returned，回傳的文件數
totalKeysExamined: 總共檢驗的 index key 數量
totalDocsExamined: 總共檢驗的文件數量

這三項參數的比例是 1:1:1 最好，最完美狀況是 1:1:0 (即 Covered Query)，但能使用的情境太少了，就不當作期望標準。

Step2 查詢 Range + Equality

{ rangeField: { $gte:2, $lt:5 }, eqField: false }

db.myctw.find({ rangeField: { $gte:2, $lt:5 }, eqField: false }).explain('executionStats').executionStats;
/* Output */
{
  nReturned: 1,
  totalKeysExamined: 3,
  totalDocsExamined: 1,
  // ...
}

可以看到符合條件的文件數量為 1，不過 MongoDB 還是檢驗了三把 Index key。如果這時後我們把 equality 放到前面來會發生什麼事？

Step3 重新建立索引順序，再次查詢 Range + Equality

{ rangeField: { $gte:2, $lt:5 }, eqField: false }

db.myctw.dropIndex('esrIdx');
db.myctw.createIndex({ eqField:1, rangeField:1 }, { name: "esrIdx"});
db.myctw.find({ rangeField: { $gte:2, $lt:5 }, eqField: false }).explain('executionStats').executionStats;
/* Output */
{
  nReturned: 1,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  // ...
}

這時效果就發揮出來了，所有值都是 1

加入索引的情況

首先我們使用相同資料，並且把剛剛排序欄位加上索引內 { eqField:1, rangeField:1, sortField:1 }

db.myctw.dropIndex('esrIdx');
db.myctw.createIndex({ eqField:1, rangeField:1, sortField:1 }, { name: "esrIdx"});

db.myctw.find({ rangeField: { $gte:2, $lt:5 }, eqField: false }).explain('executionStats').executionStats;
/* Output */
{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 1,
  totalKeysExamined: 1,
  totalDocsExamined: 1,
  executionStages: {
    stage: 'SORT',
    nReturned: 1,
    works: 5,
    advanced: 1,
    needTime: 3,
    isEOF: 1,
    sortPattern: { sortField: 1 },
    memUsage: 71,
    memLimit: 33554432,
    inputStage: {
      stage: 'SORT_KEY_GENERATOR',
      nReturned: 1,
      works: 3,
    //...
    }
  }
}

這邊可以看到多了一個 stage，進行 SORT_KEY_GENERATOR，這就是使用 RAM 進行索引排序的階段。

如果我們調整了索引順序，改成 ESR，看看會發生什麼事情？

db.myctw.dropIndex('esrIdx');
db.myctw.createIndex({ eqField:1, sortField:1, rangeField:1 }, { name: "esrIdx"});

db.myctw.find({ rangeField: { $gte:2, $lt:5 }, eqField: false }).sort({ sortField:1 }).explain('executionStats').executionStats;
/* Output */
{
  executionSuccess: true,
  nReturned: 1,
  totalKeysExamined: 2,
  totalDocsExamined: 1,
  executionStages: {
    stage: 'FETCH',
    nReturned: 1,
    works: 2,
    advanced: 1,
    docsExamined: 1,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      works: 2,
      // ...
    }
  }
}

可以看到剛剛的 SORT_KEY_GENERATOR stage 就不見了，整個流程也是只有 IXSCAN。

透過上述的範例，在資料量大時更為明顯，因此要先設想好需求的查詢條件，在建立新的索引與查詢務必先透過 explain 來評估是否符合預期。