Published on 2023.08.12 by Luis Pedro Coelho.
We are looking for PhD students! Note that PhDs in Australia are research-based and do not require coursework. A background in computer science, bioinformatics, microbiology, or genomics is best, but there is no strict limits. Some basic programming skills are needed (some projects require more than others). Projects can be defined for and with each individual student within the broad topic of studying the global microbiome, but some particular projects in the group are 1. small proteins of the global microbiome 2. antibiotic resistance at the global scale 3. [tool...
Published on 2022.11.30 by Jelena Somborski, Celio Dias Santos Jr., Luis Pedro Coelho.
Abstract: Accurate data visualization is essential in scientific communication. Scientific colormaps must represent data fairly and be universally readable. This post summarizes guidelines and tips for utilizing colors in data visualization. ## Background In scientific communication, accurate data representation is crucial. Colormaps are value arrays that determine the colors for images, figures, and other graphical objects. They have an important role in data visualization since for humans, color vision is a potent and fast way of acquiring information. Adding colors is much more than adding an aesthetic...
Published on 2022.09.29 by Breno Lívio Silva de Almeida, Celio Dias Santos Junior, Luis Pedro Coelho, Yiqian Duan.
Abstract: Having a dataset with around 1 billion unique smORFs, we generated rarefaction curves for different habitats, and also when grouping them in higher-level environments. The curves show that habitats such as soil remain relatively under-sampled and show higher smORFs richness compared to other environments. ## Distribution of the Small Open Reading Frames There are millions of open reading frames (ORFs) spread through the genomes from diverse habitats that, if they were translated, would generate many small proteins (those with fewer than 100...
Published on 2022.06.23 by Svetlana Ugarcina Perovic, Luis Pedro Coelho.
Update (Sept 14 2022): With the release of RGI version 6.0, this post is no longer relevant as version 6 excludes nudged hits by default. We thank the RGI team for their continuous work on improving this tool. Many antibiotic resistome studies are based on CARD annotation hits mapped through the RGI pipeline. This post will not be about the database but about a “feature” in the widely used tool RGI that can (and probably already has) affect your results resulting in a big number of “Perfect” and “Strict” hits. While...
Published on 2021.10.15 by Anna Vines, Célio Dias Santos Júnior, Luis Pedro Coelho.
tl;dr. In our research, we aimed to investigate in silico the production of cryptic antimicrobial peptides (cryptic AMPs) by prokaryotes. Using AMPsphere and the proGenomes2 representative genomes dataset, we found >50,000 potential cryptic AMP candidates produced by microbes. We further analysed the properties of these cryptic AMPs by assessing their position within their precursor protein, comparing their antimicrobial score to the protein around them and through ascertaining which enzymes could release them through proteolysis. We conclude that there is a potential...
Published on 2021.03.29 by Svetlana Ugarcina Perovic.
In this current moment of pandemic-induced remote work, now more than ever, having an online presence is important. Twitter is definitely one of the most powerful platforms for promoting and disseminating academic outputs in a less formal way. In this post, I will show you how beneficial could be scrolling meaningful short messages – tweets – for your research and how Twitter users – tweeps – can help you make a far greater impact. At the end of the post, you can find a list of good reads not only about Twitter, but also academic profile platforms that can make your academic life much easier. I love...
Published on 2021.03.24 by Shaojun Pan,Luis Pedro Coelho.
Update (June 2021): This blogspot originally used the old name of the tool (S3N2Bin), which is now called SemiBin. The blogpost and the links have been updated to use the new name. tl;dr SemiBin is a new tool for binning (inferring MAGs, metagenome-assembled genomes), exploiting deep-semi supervised learning. It can assemble more high-quality genomes than other tools. ## The background: what is contig binning? The goal is to derive metagenome-assembled genomes (MAGs). The first step (after QC) is to assemble contigs. Since from a metagenome one can rarely obtain genomes as a single (circular)...
Published on 2021.03.20 by Célio Dias Santos Júnior, Luis Pedro Coelho.
A short time after the release of AMPSphere v.2021-02, we now make available the validated SPHERE families obtained by clustering the alphabet-reduced antimicrobial peptide sequences. Figure 1:AMPSphere v.2021-03 at Zenodo.com. This new version v.2021-03 brings alignments and trees in Newick format calculated from families in the level III (hierarchically clustered by 100-85-75% of identity) with 8 sequences or more. No other...
Published on 2021.03.05 by Célio Dias Santos Júnior, Yiqian Duan, Luis Pedro Coelho.
Fellow researchers, We are glad to inform you that the database AMPSphere v.2021-02 is now available online under DOI: 10.5281/zenodo.4574469. AMPSphere is a comprehensive catalog of antimicrobial peptides (AMPs) predicted using Macrel (DOI: 10.7717/peerj.10555) from 63,410 public metagenomes, ProGenomes v2.2 database (82,400 high-quality microbial genomes), and c.a. 4k non-whitelisted microbial genomes from NCBI. The version v.2021-02 contains 863,498 sequences (avg length: 36 amino acids...
Published on 2021.03.04 by Karma Dolkar, Luis Pedro Coelho.
Microbiopy aims to become a tool that implements machine learning analysis on microbiome count data using Python. Currently, it implements basic functionality, which can already be used to perform some basic analyses as described here. ## The Microbiome The microbiome is a collection of all microbes in an environment. Many different bacterial, fungal, and archaeal species constitute the microbial consortia. They can be found in a variety of environments, ranging from the human body (like the mouth, skin, gut), animal body, soil, glacier ice, seawater, or even walls or floors of homes. ## Microbiopy...
Published on 2021.02.12 by Luis Pedro Coelho.
Initial note: For the full story, see the GUNC manuscript. Luis helped a bit developing the ideas behind the current iteration of the CSS and wrote this blogpost, but most of the work was done by Askarbek, Anthony, and Sebastian. Let's say you have a metagenome-assembled genome (MAG) and want to check if it's good. What should you do? Use checkM! Let's say you already used CheckM and want even more assurances. In particular, you want check for possible contamination to ensure (to the extent...
Published on 2021.01.29 by Tristan Gallent, Luis Pedro Coelho.
Tristan Gallent, Luis Pedro Coelho ## Preliminaries At some point, the memory required to analyse metagenomic data becomes more than an off-the-shelf laptop can manage. The Global Microbial Gene Catalog contains >300 million sequences. How can we use such resources without requiring very large computational resources? For example, when mapping a dataset of short-reads using NGLess. A natural answer is to split-up the database and work on each segment, rather than all at once. There are multiple ways to do this, the [simplest of...
Published on 2020.08.21 by Tobi Olanipekun, Celio Dias Santos Junior, Luis Pedro Coelho.
Abstract: When screening publicly available metagenomes for potential AMP sequences using Macrel, some of these sequences were homologous to azuC, a protein without a known biological function. AzuC was predicted as a potential AMP, thus this study aimed at investigating and analyzing this peptide for antimicrobial features by a series of in silico tests. Our results show that the azuC peptide looks like a typical antimicrobial peptide, in terms of structure and charges distribution. AzuC was predicted to be active against K.pneumoniae and B.subtilis...
Published on 2020.08.20 by Fernanda Ordoñez Jiménez.
GMGC-mapper is a command line tool that allows users to query the Global Microbial Gene Catalog v1.0 (GMGC), which combines metagenomics and high-quality sequenced isolates to form a catalog with 303 million unigenes.Given a genome (or any other set of genes), GMGC-mapper finds metagenomes where they (or similar sequences) are present. It can also identify MAGs (metagenome-assembled genomes) that are similar. In version 0.2, the output consisted of the predicted gene sequences and the gene information, three tables and a readable...
Published on 2020.04.29 by Célio Dias Santos Júnior, Luis Pedro Coelho.
The translation of a given mRNA needs a start codon. This is a particular sequence (typically AUG, although others exist), which indicates the position where translation should start. However, these codons also encode for a methionine residue. Thus, immediately after translation, all proteins have a methionine on their N-terminus. However, it is known that, after translation, there is a process of N-terminal methionine Excision (NME) (Wingfield, 2017). Note that NME is not carried out for all...
Published on 2020.04.10 by Amy Houseman, Célio Dias Santos-Junior, Luis Pedro Coelho.
tl;dr Maybe there are a lot of cryptic peptides with antimicrobial properties, which, with a few mutations, can become independent genes. Right now, we have some speculative results on a case study, a molecule we are calling HG4. Our current work aims at making this more robust and trying to figure out if it can be a general mechanism for AMP evolution. ## Introduction I: what are AMPs? Antimicrobial Peptides are short molecules, usually 10-100 residues in length, that interfere with microbial cells. Due to the reduced size of AMPs, their identification usually relies not on homology, but on machine...
Copyright (c) 2018–2024. Luis Pedro Coelho and other group members. All rights reserved.