最近学了一个比拟赞的电商我的项目,我的项目作者提供了残缺的示例数据,包含商品信息及配图,然而这些配图是固定的URL,商品详情为html,html中有img标签,img标签中也有url。依据过往教训这种在线CDN很容易挂掉,因而产生了把商品数据中的商品图片提取进去,放在本人的腾讯云服务器中的想法,保障可拜访性。
演示数据
[{ "ID": "b93e59e214fc4478ac72652a2c87fe54", "GOODS_SERIAL_NUMBER": "2300000059885", "SHOP_ID": "402880e860166f3c0160167897d60002", "SUB_ID": "402880e86016d1b5016016dcd7c50004", "GOOD_TYPE": 1, "STATE": 0, "IS_DELETE": 1, "NAME": "云南红提800g/盒", "ORI_PRICE": 18, "PRESENT_PRICE": 15, "AMOUNT": 10000, "DETAIL": "<img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_9395.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_3391.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_7603.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_4718.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_778.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_2602.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_7913.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_202.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_4296.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_6956.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112030_8200.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112031_3967.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112031_5114.jpg\" width=\"100%\" height=\"auto\" alt=\"\" />", "BRIEF": null, "SALES_COUNT": 0, "IMAGE1": "http://images.koow.cc/shopGoodsImg/20171225/20171225112020_561.jpg", "IMAGE2": null, "IMAGE3": null, "IMAGE4": null, "IMAGE5": null, "ORIGIN_PLACE": null, "GOOD_SCENT": null, "CREATE_TIME": 1514172047397, "UPDATE_TIME": 1522037064430, "IS_RECOMMEND": 0, "PICTURE_COMPERSS_PATH": "http://images.koow.cc/compressedPic/20171225112020_561.jpg" }, { "ID": "e0ab2f6e2802443ba117b1146cf85fee", "GOODS_SERIAL_NUMBER": "4894375014863", "SHOP_ID": "402880e860166f3c0160167897d60002", "SUB_ID": "2c9f6c94609a62be0160a02d1dc20021", "GOOD_TYPE": 1, "STATE": 0, "IS_DELETE": 1, "NAME": "菓子町园道乳酸菌味夹心饼干(抹茶味)540/罐", "ORI_PRICE": 29.8, "PRESENT_PRICE": 29.8, "AMOUNT": 10000, "DETAIL": "<img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110655_230.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_329.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_2659.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_9521.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_8611.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_1390.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110656_7291.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_3919.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_2170.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_4402.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_1926.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_9438.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_4361.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110657_2730.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110658_314.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110658_8779.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110658_9878.jpg\" width=\"100%\" height=\"auto\" alt=\"\" /><img src=\"http://images.koow.cc/shopGoodsDetailImg/20180213/20180213110658_3471.jpg\" width=\"100%\" height=\"auto\" alt=\"\" />", "BRIEF": null, "SALES_COUNT": 0, "IMAGE1": "http://images.koow.cc/shopGoodsImg/20180213/20180213110648_2744.jpg", "IMAGE2": null, "IMAGE3": null, "IMAGE4": null, "IMAGE5": null, "ORIGIN_PLACE": null, "GOOD_SCENT": null, "CREATE_TIME": 1518491222336, "UPDATE_TIME": 1523174099461, "IS_RECOMMEND": 0, "PICTURE_COMPERSS_PATH": "http://images.koow.cc/compressedPic/20180213110648_2744.jpg" }]
能够看到,数据比拟残缺,包含ID、编号、名称、价格、介绍等信息。
如果想要提取JSON对象中的图片URL,对于其中的images1-images5对象比拟好解决,只须要遍历即可。对于DETAIL中的图片URL,因为URL混在html中,没有方法间接拿到,可通过正则匹配的模式获取。上面分步骤操作:
提取IMAGE1-IMAGE5中的图片URL
const fs = require("fs");fs.readFile("./goods_demo.json", "utf8", (err, data) => { // 序列化数据 data = JSON.parse(data); data.map((value, index) => { for (let i = 0; i < 5; i++) { // 遍历数据,并写入到名为result.txt的文件中 if (value[`IMAGE${i + 1}`] !== null) { const url = value[`IMAGE${i + 1}`] fs.appendFile("./result.txt",`\r\n${url}`, function(err) { if (err) console.log("写文件操作失败"); else console.log("写文件操作胜利"); }); } } });});
应用NodeJS运行下面的代码后,就可能正确的读取到IMAGE对象中的URL,并写入到result.txt文件中。
提取DETAIL对象中的图片URL
对url地址剖析能够发现,图片URL包含http结尾(part1),CDN的URL(part2),图片所在的目录(part3),图片的名称(part4):
"http://(part1)images.koow.cc(part2)/shopGoodsImg(part3)/20171225(part3)/20171225112020_561.jpg(part4)"
依据以上正则规定,能够用以下正则进行匹配!
// \w示意任意字母数字或下划线// url中的/符号须要本义// {2,5}示意呈现2-5次// /g示意全局匹配const urlReg = /http\:\/\/images.koow.cc(\/\w+){2,5}\.jpg/g;
加上对JSON中DETAIL对象解决的代码当前,整体代码如下:
const fs = require("fs");fs.readFile("./goods_demo.json", "utf8", (err, data) => { data = JSON.parse(data); data.map((value, index) => { if (value.DETAIL) { // 匹配图片的正则表达式 const urlReg = /http\:\/\/images.koow.cc(\/\w+){2,5}\.jpg/g; const arrlist = value.DETAIL.match(urlReg); // 对匹配到的image list遍历并写入文件 if (arrlist && arrlist.length) { arrlist.map(item => { fs.appendFile("./result.txt", `\r\n${item}`, function(err) { if (err) console.log("写DETAIL记录操作失败"); else console.log("写DETAIL记录操作胜利"); }); }); } } for (let i = 0; i < 5; i++) { if (value[`IMAGE${i + 1}`] !== null) { const url = value[`IMAGE${i + 1}`] fs.appendFile("./result.txt",`\r\n${url}`, function(err) { if (err) console.log("写文件操作失败"); else console.log("写文件操作胜利"); }); } } });});
最终提取的url在reuslt.txt中存储,期待后续的解决。
http://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_9395.jpghttp://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_3391.jpghttp://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_4718.jpghttp://images.koow.cc/shopGoodsDetailImg/20171225/20171225112029_7603.jpg……
批量下载
想要做公有的CDN服务器,文件的存储门路是不能变的,不然就匹配不到数据库中存储的门路。如何在批量下载时放弃图片的目录不变呢?很简略,只须要应用wget命令:
wget -nc -r -i ./result.txt
-nc, --no-clobber 不要笼罩曾经存在的文件
-r, –recursive 递归下载,下载所有文件
-i, --input-file 下载指定文件中的URL
总结
对JSON或XML数据执行解决是程序员的必备技能,把握高效的数据处理办法能让工作事倍功半,防止不必要的工夫开销。作者写本文的目标是心愿能帮忙到有同样需要的小伙伴,也心愿电脑旁的你能把本人解决数据的技巧分享进去!