The air was full of tension. It was in the middle of the night but everyone frowned and stared at the computer with a very serious expression. Who are these people and what are they doing?
It turns out that they are a group of programmers stationed in the Minsheng Bank credit card project by the Sunline big data team. Min Gong, who was in charge of the project, recalled the scene at that time and still remembers it fresh:
In April 2020, our project team received an urgent demand from Minsheng Bank’s Credit Card Center to implement its Dream Value Engine and the project was to be launched on June 16, 2020.
What is the Dream Value? To put it plainly is similar to the role of points. Users can obtain Dream Value by doing tasks in the Minsheng Bank credit card app and then use the Dream Value points to exchange goods. The Dream Value Engine project is responsible to mainly provides service interfaces and data processing for Dream Value to ensure the points accuracy.
With only one and a half months to complete the project and the extremely strict requirements on stability, scalability, performance and other aspects with a 7X24 hours of online operation, the project is a heavy one! However, there is no such thing as ‘impossible’ in Sunline's dictionary, so we rolled up our sleeves for it.
At first, everything was progressing in an intense and orderly manner and we felt that we were in control until the credit card testing team of Minsheng Bank conducted an integration testing.
It was June 10 and the testing team found that sometimes the Dream Value engine failed to expand normally.
Seeing this phenomenon, we were a little stunned as the accuracy of the Dream Value is the foundation of the entire project! If this problem occurs after live, it would be difficult to solve within a short time and the consequences will be very serious!
With only 6 days before the project was officially launched, we had to quickly find out the cause and solve this problem but the biggest problem with this bug was that it was does not occur all the time. It was not known whether or when it will recur hence was impossible to accurately locate it.
Our project team had a meeting immediately to learn about the test scenarios from the test team to detemine various possible reasons.
Han said, “Perhaps it is a network problem?”
Liu said, “Is there an omission in the upstream information provided or an error in the consumption information or is there a problem when adjusting Dream Value?”
Yunyun said, “Is there something wrong with Redis?”
While others are guessing what it could be issues with Kafka, MYSQL, Hbase, etc.
After a diversified discussion, our team reached an agreement. Taking into account the uncertainty of the bug, we decided to adopt the most time-consuming but most reliable investigation plan, starting from the source and gradually analyzed it. Section the investigation, analyze various logs and monitor source, etc. and everyone was investigating nervously, looking for the bug.
At 1 o’clock in the morning, the project team finally located the source of the problem. It turned out that multiple messages related to the same account were put into Kafka. The first message was locked in a short time, was not released while other messages skip the waiting period causing these messages to be lost.
After locating the problem, everyone was a little relieved but did not dare to slack off and acted accordingly. At 3 o’clock in the morning, Yunyun finally found a solution from the official manual. Our team modified the source code and after the self-test is completed, they immediately submit it to the professional testing team. The regression test was successful and the bug was solved perfectly.
The project team fought for the night and were all sleepy. However, their hearts were filled with the joy of fighting hard and winning a big victory. It was already more than 4 o’clock in the morning when they left the office.
On June 16, the engine was finally ushered into a successful launch. Not only did it perfectly withstand the peak pressure of the 6/16 event but also demonstrated superior performance. Our project team was highly praised by the leaders. For us, this It is not only a recognition of the service ability of Sunline but also a recognition of our team’s hard work!