Dr APJ Abdul Kalam – RIP

Dr Kalam used to live in the house behind mine before he came President. The day he became President, the only person allowed in his house was the dhobi! He still has the photo hanging under his tree.
I still remember when APJ Abdul Kalam spoke at IIT Bombay. He was one of my inspirations.


Image credit - Thinkplus creatives

Creating a Fedora 21 LiveUSB in Ubuntu 14.04

Sadly, Ubuntu’s startup disk creator does not allow you to create fedora images.

The officially sanctioned way to create a fedora liveusb in ubuntu is the following:

sudo aptitude install isomd5sum python-parted python-pyisomd5sum python-urlgrabber extlinux python-qt4 python-qt4-dbus tar udisks libudisks2-dev

git clone https://github.com/lmacken/liveusb-creator.git

sudo ./liveusb-creator

The correct way to update/change git submodules

  • rm -rf .git/modules/interesting_modules
  • delete the lines containing “[submodule “interesting_modules”] url=”http://something/”
  • update .gitmodules
  • run “git submodule sync”
  • run “git submodule init”
  • run “git submodule update” – at this point, a new checkout should happen.
  • new checkout will fail about a version mismatch. This is expected – the version of the submodule stored in the super-project is mismatched. “git status” should also show you “modified” in the submodule directory.
  • “cd interesting_submodule”
  • “git reset –hard HEAD”
  • enjoy !

The Android ART runtime is a Golang tunnel

I’m willing to bet that the first reason that Android switched to ART from Dalvik is the possibility of linking directly to object code from Golang.

The problem is not speeding up individual apps…the problem is that the core of Android is built in Java and therefore is accessible to another application over the same language/VM. Now the issue is how do you get golang to link to this core?

ART compiles Java down to object code and is now able to link across language boundaries – theoretically, this means that I should be able to now use Android libs from Python as well.

Or, as is more likely, golang

Setting up btcd + Go build for bitcoin

My last post was about setting up the build system for Bitcoin reference system 0.9.0.

There is an alternative architecture for Bitcoin called btcd which is developed by Conformal Systems. This is claimed to be compatible with the main blockchain (including bugs).

There is a very interesting thread about how the btcd architecture (especially the split wallet/client and daemon architecture) has been adopted in the reference client at 0.9.0

I find Go very, very pleasant and productive to work and understand and it’s package manager is absolutely brilliant.

To setup your machine to work with btcd is absolutely trivial. Remember that this should be the same on any platform (Windows, Linux and Mac) since Go is cross platform in general. Only the particular binary of Go would be different.

Download and unpack Go from http://code.google.com/p/go/downloads/list . I used http://code.google.com/p/go/downloads/detail?name=go1.2.1.linux-amd64.tar.gz&can=2&q= because I’m on Linux-64 bit but go ahead and use the one that you’re on.

Assuming you unzip it to /home/sss/Code/go, set the following variable:

export GOROOT=/home/sss/Code/go

Test your Go installation by running /home/sss/Code/go -v . Ideally this environment variable should be in your zshrc, bashrc, etc. This never changes.

Now, create a directory called /home/sss/Code/mybtcd. This is your new workspace. When you are working on a particular workspace, set the following environment variable:

export GOPATH=/home/sss/Code/mybtcd

This tells your Go package manager, the location of your top level workspace directory.

Now, to get btcd and all its dependencies as well as compile it in one shot, run:

/home/sss/Code/go/bin/go get github.com/conformal/btcd/…

After a few minutes, you should have the following directories (which complies with Go’s recommended workspace directory structure)

./bin/ -> all your binaries

./pkg/ -> all third party library dependencies

./src/ -> all btcd as well as dependent third party source.

Running your bitcoin daemon is simply ./bin/btcd (help is at ./bin/btcd –help)

To hack your code, just write your code in ./src/github.com/conformal/btcd/ and run

~/Code/go/bin/go install -v -x  github.com/conformal/btcd/

All dependencies and binaries get rebuilt. Simple.

Compiling bitcoin 0.9.0 – the transaction malleability fix

I had a bit of trouble compiling Bitcoin 0.9.0 (which contains the all important “transaction malleability” fix). So I’m posting here for the benefit of everyone. This is done on an Ubuntu 12.04 machine – which is relevant only for the system packages (if you’re on any other machine just ask around for what are the equivalent packages)

git clone https://github.com/bitcoin/bitcoin.git

cd bitcoin

git checkout v0.9.0

sudo apt-get install build-essential libboost-all-dev automake libtool autoconf #this part is dependent on Ubuntu

mkdir $PWD/release #I dont want to install bitcoin systemwide, so I make a local dir.

./autogen.sh && ./configure –with-incompatible-bdb –prefix=$PWD/release


make install


P.S. if anybody reading this is on another platform and figures out a slightly different way of doing things, please post in the comments

The intricacies of Bitcoin

What are some of the under-the-surface aspects of Bitcoin that safeguard its larger application as a viable digital currency

transaction fees

each transaction in Bitcoin is subject to transaction fees, that prevent something called Dust Spam. Where do these fees go ? Txn fees goes to whoever processes the block that contains the transaction. Additional reward for miners. Very cool !

whichever miner solves the next block gets to include a transaction for 50 BTC to themself from “coinbase.”  It turns out that  that if any of the transactions the miner included in your block had a fee attached to it, then the miner gets to include those, too.  Therefore, when a miner solves a block, it typically gets something like 50.75 BTC instead of 50.  The more transactions there were, the more fees received.

If you look at the BTC webpage description about what happens when there’s no more rewards for solving blocks, it mentions that they expect the network to be big enough by then that it will be worth solving blocks solely for the fees.  If there are 10,000 transactions per block, at 0.005 per transaction fee, that’s 50 BTC in fees.  If BTC really catches on, this is a realistic volume of transactions

transactions fees are also voluntary – a transaction fee will increase the chances that a miner will include your transaction in the block he mines. Actually, a miner  just dump the top few hundred KB of  transactions into a block, sorted by transaction fee (descending, of course). When there aren’t many transactions, maybe because of a series of block in a short amount of time, it will be confirmed anyway.

Do Transactions get lost ?

When you send a transaction, it sends a packet to all connected peers. These peers store the transaction in their in-memory pools and tell all their connections that they have a new transaction. When those connections don’t have it yet, they ask for it, and that’s how a transaction spreads over the network.

When a user restarts their client, the memory pool is wiped, and the unconfirmed/unmined transaction is deleted from that computer. But it is still available on other clients. it’s very unlikely for the transaction to be gone from the entire network.

Solving a block

a block is a list of transactions broadcast by the Bitcoin network. This system evolved because of the question “how do I build a distributed transaction network without a central authority“. What will motivate people to contribute computational and network time to the Bitcoin system ? Well, the chance to make money.

Bitcoin miners act as distributed banks – or more aptly, those irritating credit card salesman who tell you “please take this credit card“. Each of them is trying to be the eager salesman and be the first to “process” your transaction – and the way they do it is solve a puzzle. The process of “Mining” is essentially the process of competing to be the next to find the answer that “solves” the current block

The mathematical problem in each block is difficult to solve, but once a valid solution is found, it is very easy for the rest of the network to confirm that the solution is correct.

Miners are essentially putting a notary stamp on a batch of transactions. That’s all they are needed for.

But how do you prevent from having a corrupt notary? Bitcoin does this by having tens of thousands of potential notaries and one of them will happen to be the lucky one that gets to do the stamp. The lucky one is the one who happens to solve the problem. All the potential notaries try to solve the puzzle over and over, but it will take about ten minutes for one to become successful.


Essentially it is a cost function that determines how hard hashing should be so that one block is found every 10 minutes, on average.


For every 2016 blocks that are found, the timestamps of the blocks are compared to find out how much time it took to find 2016 blocks, call it T. We want 2016 blocks to take 2 weeks, so if T is different, we multiply the difficulty by (2 weeks / T) – this way, if the hashrate continues the way it was, it will now take 2 weeks to find 2016 blocks.

P.S. 2016 blocks in 14 days = 144 blocks per day. This is expected difficulty

The difficulty can increase or decrease depending on whether it took less or more than 2 weeks to find 2016 blocks. Generally, the difficulty will decrease after the network hashrate drops.

If the correction factor is greater than 4 (or less than 1/4), then 4 or 1/4 are used instead, to prevent the change to be too abrupt.

NOTE: There is a bug in the implementation, due to which the calculation is based on the time to find the last 2015 blocks rather than 2016. Fixing it would require a hard fork and is thus deferred for now.

Difficulty should settle around the 70-billion mark, assuming 300 USD/BTC, 0.08 USD/kWh, 1J/GH (with Gen2 ASICs dominating the field).

A bitcoin miner’s profit relates to the amount of hashing power they contribute to the network. Since their mining power is constant, their share of the total hashing power decreases relatively when the network’s hashing power increases. I.e.

newProfit = currentProfit * currentDiff/newDiff.

At a currentProfit of 1BTC/d and a 30% increase in difficulty, they get:

(1BTC/d)*100/(100+30)= (1BTC/d)/1.3 = 0.76923077 BTC/d

i.e. their profit decreases by ~23%.

Network Hashrate – the mathematics of difficulty.

To find a block, the hash must be less than the target. The hash is effectively a random number between 0 and 2**256-1.

hash_rate = (blocks_found/expected_blocks*difficulty * 2**32 / 600)

Block maturation

Generated coins can’t be spent until the generation transaction has 101 confirmations. Transactions that try to spend generated coins before this will be rejected

The purpose is to prevent a form of transaction reversal (most commonly associated with “double spends”) if the block is orphaned. If a block is orphaned the coinbase reward(block subsidy + tx fees). “ceases to exist”. The coins are produced from the block and when a block is orphaned it is the replacement blocks version of the coinbase tx which is considered valid by the network.

So to avoid that undesirable situation the network requires coinbase tx (rewards to miners) to “mature” or wait 100 confirmations (the client makes this 120 confirmations but only 100 is required by the protocol). If a block is orphaned before it gets 100 blocks deep into the chain, then only the miner is affected.

The referece code that checks this is

 // If prev is coinbase, check that it's matured
            if (txPrev.IsCoinBase())
                for (CBlockIndex* pindex = pindexBlock; pindex && pindexBlock->nHeight - pindex->nHeight < COINBASE_MATURITY; pindex = pindex->pprev)
                    if (pindex->nBlockPos == txindex.pos.nBlockPos && pindex->nFile == txindex.pos.nFile)
                        return error("ConnectInputs() : tried to spend coinbase at depth %d", pindexBlock->nHeight - pindex->nHeight);

Block Size Limit

Currently, the block subsidy reduces the motivation of miners to include transactions, because 99% of their income comes from the subsidy. Including zero transactions wouldn’t impact them greatly.

But when the block reward shrinks, miners may get 99% of their income from transactions. This means they will be motivated to pack as many transactions as possible into their blocks. They will receive the fees, and the rest of the bitcoin network will be burdened with the massive block they created. Without a hard limit on block size, miners will have incentive to include each and every fee-carrying transaction that will fit.

So a single entity benefits, and everyone else shoulders the cost with very little benefit.


Blockchain fork and double spending

If you understood what a block is, you can ask the question – what happens if two independent miners find an independent answer to the puzzle of “solving a block”?

Good question and this is called a blockchain fork – because now there are two ways everyone else can accept what the “current block” is. The way this is resolved is that The client accepts the ‘longest’ chain of blocks as valid. The ‘length’ of the entire block chain refers to the chain with the most combined difficulty, not the one with the most blocks. This prevents someone from forking the chain and creating a large number of low-difficulty blocks, and having it accepted by the network as ‘longest’.

Now, remember that the miners decide which chain is valid by continuing to add blocks to it. The longest block chain is viewed as the valid block chain, because the majority of the network computation is assumed not to come from malicious users.

– Race conditions

If you wallet accepts incoming peer connections, they can potentially control the information you receive (by flooding your connections). This means that it is possible to convince you about transactions that the larger network is rejecting. So, if you are a Bitcoin accepting merchant, disable your incoming connections and connect to reputed nodes to confirm transactions.

It is worth noting that a successful attack costs the attacker one block – they need to ‘sacrifice’ a block by not broadcasting it, and instead relaying it only to the attacked node.

There is a variant of this called the Finney attack which needs the collusion of a large miner – unlikely, but still possible.

– Brute force and >50% attacks

The attacker submits to the merchant/network a transaction which pays the merchant, while privately mining a blockchain fork in which a double-spending transaction is included instead. After waiting for n confirmations, the merchant sends the product. If the attacker happened to find more than n blocks at this point, he releases his fork and regains his coins; otherwise, he can try to continue extending his fork with the hope of being able to catch up with the network. If he never manages to do this, the attack fails and the payment to the merchant will go through.

NOTE: Theoretically someone with massive computational power can create start controlling this, but that is highly unlikely.

Aptitude equivalent of “apt-get autoremove”

As I am moving away from Ubuntu towards base debian, I am realigning my workflow to work better in the latter. The biggest being, unsuprisingly, package management.

One of the biggest cons against aptitude was that it does not do “autoremove” as apt-get does. This actually turned out to be false and more of a planned design choice (read more here – http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=683866)

In essence the aptitude equivalent is “sudo aptitude -oAptitude::Delete-Unused=1 install”

Getting a fully activated Windows 8.1 installed on VirtualBox – and the crap that Microsoft puts you through

  • Create a new VM on VirtualBox. Make sure you are running atleast VirtualBox 4.2.16
  • run vboxmanage setextradata <name of VM>  VBoxInternal/CPUM/CMPXCHG16B 1
  • vboxmanage list vms – get the UUID for the VM that you just created
  • vboxmanage modifyvm <UUID of VM> –hardwareuuid <some random UUID> – what this step does is create a hardware uuid for your VM. Now, after you activate windows, you can clone this VM and not lose activation. (Dear Microsoft, this is not to screw you, but give myself the power to be destructive to VMs. I am using a VM, so that I dont have to be too careful with it.)
  • Download a Windows 8.1 ISO from http://winsupersite.com/windows-8/windows-81-tip-download-windows-81-iso-windows-8-product-key
  • Make sure your “network card” in VirtualBox is disconnected. This way you can make a local account, instead of being tied to a Live account. This has a lot of problems just in case you are not connected to the internet at a later stage of using your PC.
  • Use a “Windows 8.1 Generic Key” to install your Windows 8.1 ISO
  • Once Windows is installed, you will need to change the key to your key. To do that, Right-Click the Start button and start the Command Prompt (Admin).
    Type the following commands:
    slmgr /rearm
    reboot you computer and it should be activated.
  • To get full screen on VirtualBox, you need to use VirtualBox Guest Additions 4.2.16 ISO (not a higher version – there is a bug)
  • For heavens sake uninstall “Skype for Windows 8″ tile and install proper Skype from here.


God – seriously ? These are all the steps I need to follow to install an OS that I have bought on VirtualBox ? This is bullcrap.

Microsoft, please learn to trust your customer base a little bit more.

Quick and dirty NoSQL cheatsheet

  • document-oriented database – not key-value database
  • maximum value size is 16mb kept in the BSON binary format.
  • for any sharded or clustered setup, all reasonable queries must happen through map-reduce rather than the query framework.
  • biggest con is the database level write lock, which starts impacting fairly quickly as your data size grows.
  • must finely tune indexes for performance – all other queries must happen through map-reduce.
replication and clustering: Master/slave replication with datacenter level failover (http://docs.mongodb.org/manual/data-center-awareness/)
CAP tradeoffs:
  • MongoDB tunes for consistency over availability – via its locking and a single master for accepting writes.
  • There is no versioning.
  • warning: default configuration does not acknowledge write and is the reason for its performance. on turning on acknowledged writes, performance  drops to be same or worse as other nosql systems.
  • document-oriented database – not key-value database
  • biggest pro – memcached compliant api.
  • You can store values up to 1 MB in memcached buckets and up to 20 MB in Couchbase buckets. Values can be any arbitrary binary data or it can be a JSON-encoded document.
  • uses indexes called “views” for performance.

Replication and clustering:

  • master-slave replication. All writes must happen on the master.
CAP tradeoffs:
  • prefers consistency over availability. all writes go to single master.
  • biggest con: Has eventual persistence – Writes are *not* flushed to disk immediately for performance (kept in RAM).
  • does not support transactions or MVCC. it is ACID compliant on a single operation, however eventual persistence means that data can still be lost.
data-modeldocument-oriented database – not key-value database
replication and clustering: master-master replication as well as the most seamless cluster setup among all nosql (bigcouch merged into couchdb)
  • CouchDB allows for creation of indexes on separate disks (SSD?) called “views” that speeds up queries by many orders of magnitude.
  • biggest con – only access is through an inbuilt REST api which adds about 100ms to each request. All data interchange is through JSON which means performance for large documents will suffer (due to serialization)
  • queries are implicitly map-reduce.
  • relies on page-cache for performance.
consistency: biggest pro is that it has built in MVCC and transactions, so there will *never* be race conditions between read and write.
CAP tradeoffs:
  • prefers availability to consistency – especially in its multi-datacenter setup. All checks and balances happen through revision numbers that means disk usage increases pretty fast (due to previous revision numbers).
  • needs periodic compaction to clean up.
  • key-value database   – not document-oriented database
  • 2GB column value. maximum number of cells in a single partition is 2 billion. however different partitions can be on different machines/vms.
replication and clustering:  Cassandra is aware of network topology and does cross-datacenter replication fairly robustly among all the nosql systems (http://www.datastax.com/docs/0.8/cluster_architecture/cluster_planning)
  • Cassandra tends to sacrifice read performance in order to improve write performance. Because of the log structured design of Cassandra, a single row is spread across multiple sstables. Reading one row requires reading pieces from multiple sstables. However, this comes with a much, much higher fine tuning control on disk layout and data locality.
  • Cassandra also relies on OS page cache for caching the index entries.


  • Cassandra does not offer fully ACID-compliant transactions, however the cluster itself can audit success/failure. For example, if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will send the write to 2 replicas. If the write fails on one of the replicas but succeeds on the other, Cassandra will report a write failure to the client. However, the write is not automatically rolled back on the other replica. However, your application does get to know the failure condition.
  • There are no locks
  • no MVCC – Cassandra uses timestamps to determine the most recent update to a column. The timestamp is provided by the client application. The latest timestamp always wins when requesting data, so if multiple client sessions update the same columns in a row concurrently, the most recent update is the one that will eventually persist.
CAP tradeoffs:
  • Cassandra supports tuning between availability and consistency, and always gives you partition tolerance. Cassandra can be tuned to give you strong consistency in the CAP sense where data is made consistent across all the nodes in a distributed database cluster. A user can pick and choose on a per operation basis how many nodes must receive a DML command or respond to a SELECT query.
  • Writes in Cassandra are durable. All writes to a replica node are recorded both in memory and in a commit log before they are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to disk, the commit log is replayed on restart to recover any lost writes.