欢迎访问 生活随笔!

凯发k8官方网

当前位置: 凯发k8官方网 > 编程语言 > >内容正文

asp.net

基于 abp vnext 和 .net core 开发博客项目 -凯发k8官方网

发布时间:2025/1/21 17 豆豆
凯发k8官方网 收集整理的这篇文章主要介绍了 基于 abp vnext 和 .net core 开发博客项目 - 定时任务最佳实战(二) 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

基于 abp vnext 和 .net core 开发博客项目 - 定时任务最佳实战(二)

转载于:https://github.com/meowv/blog

本篇继续来完成一个全网各大平台的热点新闻数据的抓取。

同样的,可以先预览一下我个人博客中的成品:https://meowv.com/hot 😝😝😝,和抓取壁纸的套路一样,大同小异。

图片

本次要抓取的源有18个,分别是博客园、v2ex、segmentfault、掘金、微信热门、豆瓣精选、it之家、36氪、百度贴吧、百度热搜、微博热搜、知乎热榜、知乎日报、网易新闻、github、抖音热点、抖音视频、抖音正能量。

还是将数据存入数据库,按部就班先将实体类和自定义仓储创建好,实体取名hotnews。贴一下代码:

//hotnews.cs
using system;
using volo.abp.domain.entities;

namespace meowv.blog.domain.hotnews
{
public class hotnews : entity
{
///
/// 标题
///
public string title { get; set; }

/// /// 链接/// public string url { get; set; }/// /// sourceid/// public int sourceid { get; set; }/// /// 创建时间/// public datetime createtime { get; set; } }

}
剩下的大家自己完成,最终数据库生成一张空的数据表,meowv_hotnews 。

图片

然后还是将我们各大平台放到一个枚举类hotnewsenum.cs中。

//hotnewsenum.cs
using system.componentmodel;

namespace meowv.blog.domain.shared.enum
{
public enum hotnewsenum
{
[description(“博客园”)]
cnblogs = 1,

[description("v2ex")]v2ex = 2,[description("segmentfault")]segmentfault = 3,[description("掘金")]juejin = 4,[description("微信热门")]weixin = 5,[description("豆瓣精选")]douban = 6,[description("it之家")]ithome = 7,[description("36氪")]kr36 = 8,[description("百度贴吧")]tieba = 9,[description("百度热搜")]baidu = 10,[description("微博热搜")]weibo = 11,[description("知乎热榜")]zhihu = 12,[description("知乎日报")]zhihudaily = 13,[description("网易新闻")]news163 = 14,[description("github")]github = 15,[description("抖音热点")]douyin_hot = 16,[description("抖音视频")]douyin_video = 17,[description("抖音正能量")]douyin_positive = 18 }

}
和上一篇抓取壁纸一样,做一些准备工作。

在.application.contracts层添加hotnewsjobitem,在.backgroundjobs层添加hotnewsjob用来处理爬虫逻辑,用构造函数方式注入仓储ihotnewsrepository。

//hotnewsjobitem.cs
using meowv.blog.domain.shared.enum;

namespace meowv.blog.application.contracts.hotnews
{
public class hotnewsjobitem
{
///
///
///
public t result { get; set; }

/// /// 来源/// public hotnewsenum source { get; set; } }

}
//hotnewsjob.cs
using meowv.blog.domain.hotnews.repositories;
using system;
using system.net.http;
using system.threading.tasks;

namespace meowv.blog.backgroundjobs.jobs.hotnews
{
public class hotnewsjob : ibackgroundjob
{
private readonly ihttpclientfactory _httpclient;
private readonly ihotnewsrepository _hotnewsrepository;

public hotnewsjob(ihttpclientfactory httpclient,ihotnewsrepository hotnewsrepository){_httpclient = httpclient;_hotnewsrepository = hotnewsrepository;}public async task executeasync(){throw new notimplementedexception();} }

}
接下来明确数据源地址,因为以上数据源有的返回是html,有的直接返回json数据。为了方便调用,我这里还注入了ihttpclientfactory。

整理好的待抓取数据源列表是这样的。


var hotnewsurls = new list
{
new hotnewsjobitem { result = “https://www.cnblogs.com”, source = hotnewsenum.cnblogs },
new hotnewsjobitem { result = “https://www.v2ex.com/?tab=hot”, source = hotnewsenum.v2ex },
new hotnewsjobitem { result = “https://segmentfault.com/hottest”, source = hotnewsenum.segmentfault },
new hotnewsjobitem { result = “https://web-api.juejin.im/query”, source = hotnewsenum.juejin },
new hotnewsjobitem { result = “https://weixin.sogou.com”, source = hotnewsenum.weixin },
new hotnewsjobitem { result = “https://www.douban.com/group/explore”, source = hotnewsenum.douban },
new hotnewsjobitem { result = “https://www.ithome.com”, source = hotnewsenum.ithome },
new hotnewsjobitem { result = “https://36kr.com/newsflashes”, source = hotnewsenum.kr36 },
new hotnewsjobitem { result = “http://tieba.baidu.com/hottopic/browse/topiclist”, source = hotnewsenum.tieba },
new hotnewsjobitem { result = “http://top.baidu.com/buzz?b=341”, source = hotnewsenum.baidu },
new hotnewsjobitem { result = “https://s.weibo.com/top/summary/summary”, source = hotnewsenum.weibo },
new hotnewsjobitem { result = “https://www.zhihu.com/api/v3/feed/topstory/hot-lists/total?limit=50&desktop=true”, source = hotnewsenum.zhihu },
new hotnewsjobitem { result = “https://daily.zhihu.com”, source = hotnewsenum.zhihudaily },
new hotnewsjobitem { result = “http://news.163.com/special/0001386f/rank_whole.html”, source = hotnewsenum.news163 },
new hotnewsjobitem { result = “https://github.com/trending”, source = hotnewsenum.github },
new hotnewsjobitem { result = “https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word”, source = hotnewsenum.douyin_hot },
new hotnewsjobitem { result = “https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/aweme”, source = hotnewsenum.douyin_video },
new hotnewsjobitem { result = “https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/aweme/?type=positive”, source = hotnewsenum.douyin_positive },
};

其中有几个比较特殊的,掘金、百度热搜、网易新闻。

掘金需要发送post请求,返回的是json数据,并且需要指定特有的请求头和请求数据,所以使用ihttpclientfactory创建了httpclient对象。

百度热搜、网易新闻两个老大哥玩套路,网页编码是gb2312的,所以要专门为其指定编码方式,不然取到的数据都是乱码。


var web = new htmlweb();
var list_task = new list>();

hotnewsurls.foreach(item =>
{
var task = task.run(async () =>
{
var obj = new object();

if (item.source == hotnewsenum.juejin){using var client = _httpclient.createclient();client.defaultrequestheaders.add("user-agent", "mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/83.0.4103.14 safari/537.36 edg/83.0.478.13");client.defaultrequestheaders.add("x-agent", "juejin/web");var data = "{\"extensions\":{\"query\":{ \"id\":\"21207e9ddb1de777adeaca7a2fb38030\"}},\"operationname\":\"\",\"query\":\"\",\"variables\":{ \"first\":20,\"after\":\"\",\"order\":\"three_days_hottest\"}}";var buffer = data.serializeutf8();var bytecontent = new bytearraycontent(buffer);bytecontent.headers.contenttype = new mediatypeheadervalue("application/json");var httpresponse = await client.postasync(item.result, bytecontent);obj = await httpresponse.content.readasstringasync();}else{encoding.registerprovider(codepagesencodingprovider.instance);obj = await web.loadfromwebasync(item.result, (item.source == hotnewsenum.baidu || item.source == hotnewsenum.news163) ? encoding.getencoding("gb2312") : encoding.utf8);}return new hotnewsjobitem
网站地图