Tuesday, April 28, 2020

What are Google's Four Main Areas of Focus in 2020

At Google, we'll continue to be focused on the 4 key...…
 
First, creating the most helpful products for everyone
 
Second, providing the most trusted experiences for our users.
 
Third, executing at scale and be nimble
 
And finally, creating value, optimizing and prioritizing strategic areas of investment

Monday, April 27, 2020

What is Chaos Theory?

Financial market consists on 2 groups of people. One that think the market is 100% efficient and the other which thinks it is completely random.
So who is right?
The truth is that the markets are complex and chaotic systems and their behavior contains both a efficient and a random component. Therefore we can make a realistic stock market forecast, although it is precise only to a certain extent.
Complex chaotic systems are vulnerable to minor changes (butterfly effect applies) causing a big perturbation in the system pushing it far away from its equilibrium. This is called the butterfly effect. So if anything small happens anywhere in the world the financial markets are usually the first ones to be affected.

Wednesday, April 8, 2020

The most expensive bug in history...

 

The development team gets a call by CEO that they need to implement a new feature in SMARS (their system) to take trades from dark pools as well in addition to the regulated markets.

 

The development team had 30 days to comply with the new rule. Although the CEO himself was not in favour of this new feature he had to anyway go for it as it meant retaining existing business.

 

The release date for the new feature in the system was set to 9:30 AM EST on August 1, 2012

 

The plan was to deploy the new system behind a feature flag the week before the deadline; when the market opened on August 1, they'd simply turn it on.

 

At 9:30 AM EST on August 1, the Knight developers did just that: they enabled the feature flag, and SMARS began to route orders through to the RLP—they were live!

 

But something was wrong. Their charts showed anomalous spikes in trading activity on the open markets. At 9:34 AM, the NYSE called  Knight was executing a lot of trades—so many, in fact, that trading volumes for the entire market were double their normal level.

 

To make matters worse, the trades they were making didn't make sense. SMARS appeared to be buying high and selling low. At the current rate, they were losing thousands of dollars per second.

 

Alerted to the problem, Knight's Chief Information Officer called the top operations engineers together to try to identify the root cause. The rogue orders seemed to be originating from the new RLP router code, but no one could pinpoint the bug.

 

20 minutes had screamed by since the market opened, and the unauthorized trades executed by SMARS already totalled well into the billions of dollars. It was time to roll back, and ask questions later.

 

With a shaky sense of relief, the operations team scrambled to check out the last known stable version of SMARS and deploy it to their 8 production servers.

 

To their horror, as soon as the router restarted, trading volumes on the NYSE spiked again: they were now executing even more trades than before.

 

At 9:58 AM, the Knight developers shut down SMARS entirely. It had been 8 minutes since rolling back the RLP code, and 28 minutes since the market opened.

 

They'd just lost their company $460 million dollars.

 

But what actually happened?

When the developers of Knight's high frequency trading algorithm replaced some unused legacy code, they repurposed a feature flag which had been used to disable it.

 

The deployment was a success for 7 of their 8 servers, but the deploy to the 8th server failed silently, meaning that one server was still running the legacy code. When they enabled the feature flag, 7 servers operated as expected; the 8th executed the legacy code, which should have never run in production.

 

Instead of re-deploying the new code to the 8th server, they decided to roll back to the last known good state. Unfortunately, they didn't know that the problem was the feature flag, and it didn't cross their minds to turn it off. When the old system was re-deployed, every server began to run the legacy code, dramatically compounding their losses.

 

 

Where did they go wrong?

 

The Knight developers should have never allowed dead code to remain in their app for so long. Had they been more proactive, they could have easily avoided catastrophe. Reusing a feature flag was a dumb mistake that just shouldn't have been made. The developers weren't entirely to blame, though—if there's one certainty in life, it's that we will make mistakes.

 

 

When they deployed SMARS, they didn't have an automated deployment pipeline, instead relying on their engineers to manually deploy the new code; as a result, they missed an important step on the fated 8th server. When the first mistake led to a crisis situation, their monitoring was inadequate, and they didn't have documented incident response procedures which could have prevented them from making an even worse mistake under pressure.