嘗試使用 SBI 來說明案例,詳細說明可以先到這邊進行了解。
階段 | 子分類 | 分類 | 說明 |
---|---|---|---|
Situation | Situation | 定義問題 | 我的團隊幾個月前遇到了一個大的服務中斷問題 |
Situation | Situation | 起初的狀況 | 我知道發生問題,因為我收到了系統警報,那個時候我是維運負責人。 |
Situation | Target | 當下第一個行動 | 我響應了警報,開始查看系統儀表板。 |
Situation | Target | 描述問題細節 | 1. 發現我們的訂單數量大幅下降。 2. 問題是我沒有足夠的時間來找出根本原因層面出了什麼問題。 |
Behavior | Action | 決定如何行動 | 我知道我自己無法解決這個問題,因為我們的系統很大。所以我與團隊中的其他幾位工程師召開了電話會議,四個小時後我們仍然無法鎖定問題,所以我們要求回朔系統。 |
Behavior | Action | 為什麼決定這個方式來進行行動 | 這樣的調查存在收益遞減點,因為這是亞馬遜上任何客戶都可以使用的功能,永遠不會希望像這樣對客戶產生持續的影響,而且我們已經接到了幾個客戶服務電話。 因此,一旦您有幾個確認的客戶服務案例,最好進行回滾,然後留出時間進行根本原因分析。 |
Impact | Result | 說明行動的結果 | 這是幾天的調查,不僅是對實際客戶產生影響,也造成實質收益的損失。 因次最終部署一個修補程序,然後在生產中進行部署和測試,最後再次進行部署。 |
Impact | Result | 如何進行長期活動 | 我們擴展與更新測試套件,避免問題再次發生。 |
階段 | 子分類 | 分類 | 說明 |
---|---|---|---|
Situation | Situation | 定義問題 | we found a performance of service is not very well. |
Situation | Situation | 起初的狀況 | we get a call from a customer, and they say our sentiment analysis system has some problem, the result is not correct. |
Situation | Target | 當下第一個行動 | I knew the API, so I check the API endpoint in the first timing |
Situation | Target | 描述問題細節 | 1. I found the result very bad. any sentences are classified as negative. 2. we don’t have any warning or error message from the system. |
Behavior | Action | 決定如何行動 | I test the API for a few minutes. And I guess the problem is the model using the wrong version, so I want to check the change history and find out the reason. |
Behavior | Action | 為什麼決定這個方式來進行行動 | 1. because I knew is stable and didn’t have any updates this week. 2. I knew the API structure is very simple so I think it was just a mistake in the config, I can handle the problem. |
Impact | Result | 說明行動的結果 | After an hour, I found the model has be changed 10 hours ago. so I fix the config and restart the service. the reason is, that week, we have 3 co-workers and 15 interns to join us. so we introduce some stable service. and the API is one of this. we hope new co-workers take interns to read codes. we think it good idea, the co-worker can read line and line, and the intern can understand how to read. and a problem has happened, a new colleague thinks it is a test server. so he change some config for the introduction. the good news, only a few end-users get the wrong result. we email them and apologize for this. |
Impact | Result | 如何進行長期活動 | so we set review rules. before anyone wants to push now commit to production, you need to get agreement from other partners. |