非模式物种KEGG富集数据库准备（二）

2023-07-12 14:29 浏览:4979 搜索引擎搜索“养老之家”
温馨提示：为防找不到此信息，请务必收藏信息以备急用！联系我时，请说明是在养老之家看到的信息，谢谢。

一、KEGG数据下载

1、先进入官网：https://www.kegg.jp/

image.png

2、进入KO (KEGG ORTHOLOGY) Database

image.png

3、点击此处选择物种

image.png

4、此处以斑马鱼为例，所以选择dre

image.png

5、下载json文件到本地

image.png

二、json文件的处理

import json import re K_ko_dict = {} with open(json, "r")as f: K_ko_file_content = json.load(f) for children_info in K_ko_file_content.get("children"): for next_children_info in children_info.get("children"): for third_children_info in next_children_info.get("children"): name_info = third_children_info.get("name") pathway_id = re.findall(r'PATH:(.*)]', name_info) pathway_name = re.findall(r'\d+\s(.*)\s\[', name_info) if pathway_id and pathway_name: K_ko_dict[pathway_id[0]+"\t"+pathway_name[0]] = [] if third_children_info.get("children"): for fourth_children_info in third_children_info.get("children"): K_name = fourth_children_info.get("name").split(" ")[0] gene_name = fourth_children_info.get("name").split(" ")[1] gene_name=re.sub(';','',gene_name) K_ko_dict[pathway_id[0]+"\t"+pathway_name[0]].append(K_name+'\t'+gene_name) out=open(outfile,'w+') out.write("pathway_gene_id\tgene_name\tpathway_id\tpathway_name\n") key1=sorted(K_ko_dict.keys()) for key in key1: K_ko_dict[key].sort() for i in K_ko_dict[key]: out.write(i+'\t'+key+'\n') out.close()

处理后的文件：

image.png

如果需要gene id 那么需要gtf文件，通过gene name转化即可

注意：json.load(f) 若报错：AttributeError: 'str' object has no attribute 'load'，那么需要修改名称，此处的f指向的名称为json，与import json重复，覆盖了此处的名称。因此会报错

展开全文+

发布人：655b**** IP：117.173.23.*** 举报/删稿

打赏