Sunday 17 June 2018

Review/Summary: Algorithms to live by

This book is a fascinating attempt at bridging the gap between human problems and computer problems. Making decisions is hard work and you never really know whether you made the right choice.

For me the biggest lesson learned from this book is that most decisions are made by using intuition and intuition is bad at statistics. Bad choices in life might bring a lot of misery, but ,no matter the outcome, knowing that you made the best possible choice using the best method will bring you solace. Many people prefer the romanticism of taking a leap of faith in many situations like in finding a place to live or a partner. It might be fine for the "lucky" outcomes, but sometimes you might want something more than a dice roll. That what this book was all about for me. Choosing to take control, not just relying on luck.

This book drills into several popular algorithms and tries to fit them into day to day life. Moderate computer science knowledge is useful, but mostly the book is presented in a casual way.


First chapter: optimal stopping problem

The problem is very broad, basically it tries to answer "when is the best time to stop". When you pick a price to sell your car, when you try top find an employee, etc. Anytime you have a limited amount of time or items to evaluate, which is nearly always in reality, this solution helps you not to just rely on intuition. For example, if you evaluate too little offers for your car, maybe the best one is not offered yet. On the contrary, if you stop late, maybe the best offer is gone. 

The magic solution, which is not really easy to deduce, is 37%. The book explains how stopping is best only after 37% of the cases have been evaluated and then choosing the next option, which is better than all the previous ones. So whenever you have some metric and are trying to find the best option, consider at least 37% of the options or wait at least 37% of the time, before making the choice. 

This solution sounded a bit magical at first, but it is the best known solution which has the highest probability of finding the best option from a list of options.

Second chapter: explore vs exploit

This is also an interesting one, the book shows how we often choose between broadening our choices(exploring) or sticking to our favorites(exploiting), which is not a trivial issue. How do we know that our favorite things are the best things, maybe we just haven't been exposed to better stuff yet. Babies explore a lot and as we grow and become older, we generally tend to explore less and exploit what we have learned already. 

But is there a way to figure out a healthy balance between the two? Always exploiting is short-term profit, but can leave you in a bubble. But if you know a lot already, chances are you might not find better solutions. There is no clear answer to this, but you should ask yourself, whether pleasure today is better than more pleasure tomorrow.

Third chapter: sorting

The book convinces the reader that the difference between sorting randomly and sorting small can be huge. Giving an example of sorting socks, you can pull a random colored sock to match the current one, going through them all, which would take a while, as opposed to using a simple algorithm. 

In this part of the book the time complexity of actions is important to understand. These are generic formulas that represent how many actions it takes depending on amount of input to finish sorting. 

As the amount of sorted items increase, the importance of time complexity increases a lot. There were a number of algorithms introduced: bubble sort, insertion sort, merge sort, bucket sort and others that are similar to these.  

Unless the amount of items to sort is big, it doesn't really matter for a computer, but if you want to optimize some sorting practices yourself, maybe it is worth taking a look into them.

Fourth chapter: caching

This chapter shows a convincing link between how computers and humans cache data in a similar way. So computers have different tiers of memory, because it is costly to just use the fastest memory and also, the distance to the processor matter. There is the main memory, but even that is much slower than desired, so caching is a nice solution. Basically it means that you try to predict the next time you will need an item, so you keep it close. 

For humans there are also several tiers of memory spaces. Like your working desk, closet and even your brain memory. Those are also using caching to predict what you will need. Generally, you just place whatever you think you need nearby, based on importance, using different tiers. 

The best and most natural way of organizing the cache seems to be first in, first out. Meaning that the last item you used, should be placed in the most accessible position, because it is probable that we will use again the same items that we used recently. Most computers do this sort of caching to some extent.

But the most interesting thing that the book taught me, was that even the working memory of a person uses this first in, first out model for remembering things. I've made simple experiments myself to confirm, for example, that words that you have recently used, can be remembered the fastest.

Ever since reading it, I've adapted a new look on the human memory, and how amazing it is. The book compares the memory to a really long shelf of books, which represent memories. It is sorted by last accessed time. So it is easy to recall everything you have accessed recently. But also that there is no capacity limit to this list, only that the time complexity of the queries gets so large, that it takes too long to find items.

So in short, organizing your memory is important, as whatever you might be doing, will fill up your cache. Probably you don't want to fill it with crap that would prevent access to more important things. This shows that what you may think is important is not really important, but what memory you access recently is important in the context of remembering it.

Fifth chapter: scheduling

Computers run by scheduling resources to one program at a time, so how does everything run? Well, each program gets a slice of time to run, and they are distributed so fast, that it just seems like all programs are running at the same time. There are constraints though, you don't want to give too big of a time slice to a program, it would just freeze everything else. With similar constraints, scheduling is all about dividing some resources with given constraints.

There is no one size fits them all, even such simple algorithms like, always do shortest task first may result in never getting to the big tasks, if short tasks keep coming in. It really depends on the case, for example, if you have a food service, standing in queue is accepted. Even if the newest customers might have an order that is much faster to fulfill, it might not be acceptable for others. 

There are some standard approaches though:

  • Earliest due date - you will always do the tasks with earliest due date first, minimizing lateness.
  • Moore’s algorithm - it's like earliest due date, but if you feel like tasks will start being late after a while, you just cut out the longest task and proceed as normal
  • Shortest processing time -  do the shortest tasks first, you might get a lot of tasks done, but maybe a big task might go unnoticed
These seem useful, but finding the right conditions for an algorithm is mandatory. Otherwise there might be bad consequences if using the wrong algorithm for the situation. Generally, both weighing the importance of the task and the length might be the best. Like putting out the fire even though it might take more time than the shortest task, but it should have more importance.

Sixth chapter: Baye's rule

This part is all about probabilistic thinking and how people misjudge probabilistic data. By our nature, we tend to think in individual outcomes. For example, if you are given a diagnosis and a 50% chance to live 8 months before you die, you might think that at most you'll get to live a year or two, given the odds. But In reality, if you know what is the distribution of the results, you will find that there are always extreme cases, where the same condition might not mean a death sentence.

The book introduces some core theorems when it comes to making predictions given some previous observations:

Baye’s rule: P(A|B) = P(B|A)P(A)/P(B), it shows how to calculate probability of an event(A) given some preexisting condition(B). Somehow it is not intuitive for us to calculate this naturally. For example, imagine that there are 2 taxi companies in the whole city. Blue which are 20% and red which are 80%. After an accident a witness reported that she saw a blue taxi involved in the accident. Now given that witness memory is 70% reliable. The probability that the witness told the truth would be a rounded 37%, which seems non intuitive. Understanding similar situations where our straight forward thinking might lead us to wrong results, can be empowering.

Laplace’s rule of succession: it is natural to think that out of 10 tries, you got 5 successes, so the probability of success might be 5/10 (50%) But often it is impractical due to lack of data, if you draw a lottery ticket the first time and succeed, using the previous approach, you would come to a conclusion of a 100% to win. Laplace's rule of succession can help in these situations, it gives an estimate that (sucesses + 1) / (number of tries + 2) works much better with less data. For example, the same lottery example would give 2/3 chance of success even if you won with first try, if you keep winning, it would eventually get near 100% though. This is more practical, as we don't have extreme results as often.


The Copernican principle: this is a very simple method of estimating the probability of something without prior knowledge. It is likely that you have encountered an object near the middle of it's lifetime. So the expected remaining lifetime is the same as the current lifetime. It might have weird results when assuming that 10 year old children might live only 10 more years. But is seems absurd only because we as humans have so much knowledge about human life expectancy. But if you have no prior knowledge at all, this method is surprisingly good.

It is important to know the probability distribution of a given occurrence. Distribution tells us how far from the mean the extremes can go.

Normal distribution: it is the famous bell curve, the way IQ distribution works, for example. Most people are very near the mean, 97% are still pretty close to the mean. In this distribution values tend to stick near a certain value and don't go very far from it. Human lifespan and height and weight follow the same distribution. Events are surprising when they happen early since we expected them to reach average, but not surprising when they're late, we expect them to happen more and more, as they surpass the mean.

Power law distribution: most things are below the mean, but there are some enormous ones above it. For example, wealth distribution works like that. Top 1% might have most wealth, but the top 1% of the top 1% would have most wealth of the top 1% and so on. This is a result of big things getting bigger. Even for websites, more exposure leads to more exposure, so there is explosive growth. The longer something has gone on, the longer we expect it to go on.

Erlang distribution: this distribution is something between normal and power distributions. Events are never more or less surprising, no matter the time they occur. For example, pedestrian traffic uses such a distribution.

This chapter taught me that people have very skewed views of the world, because we tend to be more excited and enjoy more of the rare information. But to do good prediction, we need accurate data from the whole distribution. If we are only interested in the unique 0.1% and only see in the news the interesting 0.1% of cases, we will develop unrealistic priors for calculating probabilities.

Seventh chapter: overfitting

We used to believe that more data. more variables lead to a better prediction outcome. But nowadays it is known that adding complexity might actually ruin the results. Sometimes overthinking a problem just wastes your time and you can come up with a lesser solution. 

A complex solution will always be better if we have the perfect data. But we always have unknowns and fitting the prediction models perfectly onto the data might produce perfect prediction with the given data. But it would fail for data outside of the test set. We cannot make a decision based on the whole world's knowledge, so it is often better to not overthink especially the unknowns.

There are ways to combat complexity, by penalizing results of a predictive model that is too complex, so it has to do way better than their less complex alternatives. 

A heuristic solution might be the best case when dealing with complexity. Or at least give a limit to the amount of ideas that are considered, regularizing the solution.

Eighth chapter: relaxation

Often a problem is impossible to compute even with the the whole world's computing power. In those cases relaxation can give you a similar result. Basically it involves reducing the constraints of a problem. You solve an easier version of it and it might be good enough or lead you to a better direction.

Similar to overfitting's complexity penalty, Lagrangian relaxation will penalize tougher constraints, which would help going for solutions that are easier to solve.

Rather than trying to be idealistic and not being able to solve tough problems. More progress can be achieved by trying to solve easier problems. If you don't accept good enough solution, you might not be very productive. As most if not all things in this world have flaws and are designed to be good enough, have some sort of acceptable failure rate/ expiry time or something similar.

Ninth chapter: randomness

Some problems are quite complex to solve in a reasonable amount of time. But you can often use random sampling to estimate a good enough result much faster. For example, a card game called solitaire can have beginning cards that are not possible to win with. But to calculate the probability of getting losing cards is too difficult due to the huge amount of possible positions and cards in those positions. But just playing the game many times can give a good estimate to the problem.

Sampling can succeed where complex analysis fails, but sampling also has to be truly random. Which is pretty difficult to achieve in most cases. So there are different algorithms used to avoid getting stuck in a local maximum, which is a best result you can achieve using current inputs, models. But there exists a better maximum which is not local.

Hill Climbing - given an initial solution, it randomly tries to adjust the equation to get better results. It makes incremental adjustments to guess and reach the best solution. But it might get stuck on local maximum, so random-restart or random jitters can help with that.

Metropolis Algorithm  -  similar to Hill Climbing, but at any given point can potentially accept bad tweaks as well as good ones, which sounds almost like mutations, basically just to escape the local maximum.

Simulated Annealing  - similar to metropolis algorithm, but it slows down the frequency of getting bad "mutations", it becomes more solid eventually.

Randomness is also a natural way for people to interact with new things. When you buy a new phone, you don't dissect and analyze the schematics, but you try pressing random things to see how it works. It is overwhelming to think about everything that the phone could do, so pressing buttons and seeing what happens is the easiest way to know how to operate the phone. As with many unfamiliar things, you would start by random interactions to get to know the system, because we often do not have neither the time or need to completely break down a thing before we can use it.

Creativity can also be enriched with randomness. Often we end up in tunnel vision when brainstorming some issues. But a quick random topic might bust you out of the local maximum and make you think out of the box. This also ties with the explore vs exploit idea that we want to explore a lot of random things first, just so we know that our local best result is not just a small bubble.

Tenth chapter: networking

Basic communication is not simple for people, nor it is for computers. If you send a message, you can never be truly sure that it got delivered by both sides. For example, the two generals' problem shows Two armies, each led by a different general, are preparing to attack a fortified city. The armies are encamped near the city, each in its own valley. A third valley separates the two hills, and the only way for the two generals to communicate is by sending messengers through the valley. Unfortunately, the valley is occupied by the city's defenders and there's a chance that any given messenger sent through the valley will be captured. The first general would send an order to "attack at midnight", but he wouldn't dare to attack without knowing that the second general received the message. When the second general receives the message, however he also would not dare to attack unless his confirmation has been delivered to the first general. This can continue forever.

In computers we use the triple handshake as a heuristic for such a problem. 

  1. Host A sends a TCP SYNchronize packet to Host B
  2. Host B receives A's SYN
  3. Host B sends a SYNchronize-ACKnowledgement
  4. Host A receives B's SYN-ACK
  5. Host A sends ACKnowledge
  6. Host B receives ACK. 
  7. connection is established.
Similarly people would after require acknowledgement after a longer moment of talking. By saying "you, know?" or similar phrases to probe the status of the conversation be it over the phone or in person if the person seems to not be paying attention.

Messages are often lost in a network, to figure out the situation, we could resend our message, but if both sides decide to do that whenever something is lost, the medium of transmission would get congested. To combat this computers use some randomness when transmitting the messages, chances are that both parties will not try to speak at the same time, if they roll a dice on how long to wait before speaking. Humans do something similar, for example, when facing a pedestrian you have to dodge left or right, but if you both dodge in the same direction, it would not work. Randomness can solve these kinds of problems.

Exponential backoff is an algorithm for spacing out repeated retransmissions. You don't want to tax the network by sending many retransmissions, but also giving up is not good. So a solution is trying again with exponential increase between next tries. Even with many failures the time becomes quite long, so there is very little amount of messages sent. This can also work for human interactions, like if someone disappoints you many times, rather than giving up or trying again until you cannot handle it anymore. You could give them a timeout that gets exponentially larger every time, thus not having to burn any bridges or having infinite mercy. This system is being tried by some courts, because repeated offenders often get a lot of small warning and a big sentence, when the judge is sick of seeing their face. But if you add exponential punishment, it would not be a surprise to the offender and they would not expect to get scot-free next time.


Additive increase, multiplicative decrease - it is the way to control congestion, by slowly growing up to the maximum bandwidth additively, but if there are transmission problems, the bandwidth get cut into half every time, thus allowing for rapid adjustments to increased traffic. People also have a natural tendency for this kind of behavior. Often when you receive a text message, it encourages sending another and another, but when the reply is taking too long, you are not inclined to continue sending anymore. This approach teaches us that everybody will at some point be working at a job that they are not good at. Because we eventually get promotions and rarely ever go down the working hierarchy. Maybe this algorithm is not ideal for humans, but for networking the efficiency of ramping up everyone's performance until failure proves to be best at utilizing all of the resources and packets are not dissatisfied by being downgraded.


Control flow is quite important in the networking, even if there are delays in our speech over the internet, we would rather not have the lost parts be resent. But we would rather accept some problems, but still retain real-time conversation. Low latency is very important for gaming and talking, even for regular speech. It is important to get regular feedback when telling a story. The whole story tellers enthusiasm can be ruined, if the listener is not paying enough attention to the story and giving enough acknowledgements. 

In many ways we have designed computers to communicate using our own strategies that we have developed as social beings, it is quite impressive to break down and formalize our ways of talking to each other and have it simulated by billions of devices.

Eleventh chapter: game theory

Game theory is about anticipating the anticipations of others, outwitting the players and being able to get into the mind of others, it is all about bluffing on multiple levels and trying to deduce whether your are being tricked and trying to trick the other players at the same time. It is all about anticipating the anticipations of others. This might result in an infinite recursion of players expecting what other players expect of them of expecting what they expect of the players and so on.

"I think I know what you think. but have no idea what you think I think." is a simple belief we hold on to that is almost a requirement for many businesses to work. For example, in stock exchange one party is willing to sell their stock, thinking that it's value will drop. The other party assumes that they know something more and think that the stock price will rise, so they exchange. Or buying to resell, you can always try to find somebody who will want to pay more for the same item.

Nash equilibrium — both players in game want to follow the same strategy after a certain time playing the game, which results in a balanced state of the game. Nash proved that every game has at least one kind of Nash equilibrium in two player games. Unfortunately for complex games finding this equilibrium is not easily to compute.

Just because both players' strategies are the same, it doesn't mean that they are the best for the player. For example, in the prisoners' dilemma. You are arrested for robbing a bank. If you both keep silent, you both can walk free and split the hidden money later. But if you are the informer, you may get the whole amount of money alone. But if you both inform, then you split the sentence. Surprisingly the best choice is to always inform, you either gain the whole loot or reduce your sentence by half.

Players acting rationally with self interests might get very bad on large scale. In these situations the price of anarchy is high. For example, buying toxic leaded gasoline was just 10 cents cheaper and what does it matter if one person does this, nothing much. But if everyone thinks like that it can result in global pollution of grand scale. And thus an equilibrium is established.

Even if people don't want to, the equilibrium has been established that makes people in the US to work so many hours. As more people accepted working for more hours, it lead to a competition for others, who unwillingly had to work more, just to be competitive. So people even tend not to use so many vacation days, because they want to appear more loyal and hardworking. Even if the company offers unlimited vacation days, people don't take them, because they don't want to be the only ones to take vacation, so nobody ever does. You don't want to seem like the person who takes the most vacation days. The Nash equilibrium is 0.

Also, the same can apply to businesses. If one shop decides to open on public holidays or irregular times. It forces others to adapt and do the same. Otherwise the customer base might be lost to their competitors who are more accessible. It might be absurd, since this happens unwillingly most of the time, because of Nash equilibrium. Nothing much can be done by the player to change this.

It is possible to worsen every outcome and yet make everyone’s lives better by shifting the equilibrium to what would be considered optimal. Changing the rules might be a good solution. This is where the government or some supervising entity can help. Like making laws mandating minimum vacations and limiting shop hours. Punishing being in the old equilibrium will make it unattractive and a new equilibrium may be formed.

Religion is also regulating the equilibrium to some extent. Moral codes are shaping the human being using limitations that essentially should improve their lives, even if their selfishness might be beneficial locally.

In nature there is no way to change equilibrium. For example, jungle trees are competing for height to get more sunlight, so they invest their nutrition in growth as much as possible. But if all trees could agree on not growing too tall, they could use their nutrients in a much more valuable way. But there is nothing that can regulate that.

Game theory also tells us that catastrophes like financial bubbles can happen even when no one is at fault. Everyone is investing because everyone else is investing. Making a positive feedback loop.

A group of agents who are all behaving rationally can nonetheless fall prey to effectively infinite misinformation, information cascade.

Often people acting on a whim, without good information, can introduce a deceitful notion that they have important private information. That might lead to more and more confusion in the public information domain. In auctions this may lead to misinformation cascade. It is important to not just use the behavior of an auctions buyers as your information source. We should not make assumptions based on what people are doing, but think why they do it.

Final thoughts

We tend to weigh the world based on outcomes, but rarely see the road traveled to reach the outcome. Thinking that people just get lucky or are better, because they got some outcome is naive. It is important to acknowledge the role of process in acquiring outcomes. Yes, there is a big role in luck, but the most optimal approach will bring more fruits sooner or later.

I've only written my main and most important thoughts around the subjects that are brought about in this book. The book can be very philosophical at times, so most of the content is purely subjective and can be interpreted in different ways. Nonetheless, this book can be life changing and offer amazing conclusions that help us see who we really are and why we do certain things. I suggest reading this book if you seek to understand yourself and the world more. It may not be a walk in the park, but if you can push yourself through the tougher concepts, it will definitely be worth the effort. 

Friday 8 June 2018

Growing up

This post is an attempt to right my wrongs and finally become a real adult. Unfortunately, my life has been a great struggle with myself, feeling proud when I win, but trapped in mind's games most of the time.

I feel like most of my life is me trying to avoid actual living, like many, I would spend countless hours absorbed in all kinds of useless information streams. But what made it worse, was my talent for discipline, but the wrong kind. I would even force myself to spend more time on avoiding life than I  was comfortable with. Surely, such escapism would lead to depression, which would have been ideal, a chance to make a change. But no, somehow my clever mind was able to keep a good balance and stay between being barely content with my life and utter depression.

The mind is so clever at staying in comfort. It builds walls and identities that would make it extremely hard to change things. But It is time to make a change, while the beast of broken dreams sleeps, I will make my escape.

I suspect that my suppression of life comes from me not being comfortable with all the open questions that cannot be answered at this point in life. Over thinking everything and coming to lazy, pessimistic answers can probably make anybody discouraged. So I'm not really surprised that my mind needs some rest from all the uncertainty. Everything seems to fade away and there is no anxiety when you are too busy with something. Be it immersing yourself in a movie or video game, where everything has a boundary and rules are known, that gives me a momentarily peace of mind. No fear, no unknowns, but yet it doesn't feel right. It reminds me of an episode in a TV show where a smart guy decided to become dumb, so he could be more happy with his less talented wife.

Escapism is tempting, but it is not real, after hours of bombarding my brain with information, I feel empty and regretful. Even when thinking that it is time to stop, the cries for help can still be suppressed using the same mechanism. Not being in control is frightening, but what is even worse is that time flies when you are being controlled. All the opportunities lost somewhere in the past.

I want to be free. I do have dreams and aspirations. It is time to stop this madness or else there is no hope left. I want to read hundreds of books and learn many skills. I want to be proud of myself and become the best version of myself that I can. The answers to unknowns will come eventually. And when they do, these will not be fake condolences giving answers to questions of limited scope. But the glorious and absolute truth.

I will list the things that I want to do when I grow up.


  • learn to draw
  • learn the art of oratory
  • become fit and strong
  • learn math proofing
  • learn abstract math
  • learn spiritual ways
  • learn to rap
  • learn ethical hacking
  • learn some cryptography
  • learn to write blogs
  • learn a music instrument
  • read all the books I want
Please, be strong and remember why you want to be free, do you want to live in hell or create a heaven for yourself?