Bioinformatics Databases-Biopython

Access Bioinformatics Databases with Biopython

Bhagesh Hunakunti
2 min readMar 24, 2021
Photo by NASA on Unsplash

One of the very common things that you need to do in bioinformatics is extract information from biological databases. It can be quite tedious to access these databases manually, especially if you have a lot of repetitive work to do.

Biopython attempts to save you time and energy by making some online databases available from Python scripts.

Currently, Biopython has code to extract information from NCBI, ENTREZ, PDB, KEGG databases. The each of these database can be accessed via modules (named after the databases), the code in these modules basically makes it easy to write python programs that interact with the CGI scripts on these pages, so that you can get results in an easy to deal with format. In some cases, the results can be tightly integrated with the Biopython parsers to make it even easier to extract information.

Here I present to you a project which will guide you’ll on how to go about writing Biopython scripts to access data from various databases.

By the end of this project, you will learn to access, parse, and visualize data from various bioinformatics sequence and structural online databases such as ENTREZ, PDB, KEGG and NCBI using Biopython.

Learn step-by-step

In a video that plays in a split-screen with your work area, your instructor will walk you through these steps:

  1. Sequence alignment using NCBI-BLAST
  2. Fetch PUBMED & Nucleotide sequence using ENTREZ
  3. Fetch proteins from PDB
  4. PROSITE & SCANPROSITE from EXPASY
  5. Access KEGG database

You will also interact with various bioinformatics file formats such as FASTA, PDB, GENBANK and XML along with various parsers to read and modify these files using Biopython.

Happy Learning!

--

--

Bhagesh Hunakunti

Pursuing Masters in Bioinformatics, Digital artist and content creator. Contact: https://linktr.ee/BhageshCodebeast