DNA diversity index script

Client: Julian Marchesi
Technologies used: Python

The client required to calculate the DNA diversity index, which required the reading of a large file that had a list of DNA sequences grouped by species in it. A second file contained a matrix of sequence distances.

The sequence dataset could contains 500 000+ values, while the matrix dataset could contains 10 000 x 10 000 distance values. The greatest challenge here was creating an algorithm that takes the smallest amount of time possible to read the data and calculate the required value, because the script would be ran on multiple datasets.

  • Twitter feed
    @bostjan_cigan

    Freelance software and web developer who also gets a kick out of cooking.

    @Medium Will you be providing any other pay methods for founding members? Like PayPal?

    @duetdisplay Caught my eye, purchased it, works great. Will there ever be partial WiFi support?

    I just published “Understanding time complexity of recursive algorithms” medium.com/p/understandin…

    @thinkful Thank you for letting me join your amazing mentoring team. Looking forward to a great new experience!