php抓取百度热榜热点新闻代码分享

2020-09-04

作品

456次阅读

没有评论

原理：

php请求百度首页的所有代码，然后正则匹配出百分展现出来的json数据，抓取完整的json数据之后用php代码本地整理之后存储到txt文件里面，每天保存一次，每次调用直接从txt文件中读取内容，不会重复读取百度首页的内容。

现在唯一的缺点就是百度首页的热点有可能不是一天一更新，而是实时或者几小时更新一次。但是为了性能我这边就不做判断了，只进行每天一次。

如何使用：

API地址：https://api.zyooo.com/bd_rebang/

演示地址：https://api.zyooo.com/bd_rebang/cs.php

参数说明：

name=>新闻标题；url=>对应的百度新闻地址；rudu=>新闻热度

获取的内容是一段json数据，完整的从0到28，一共29条数据，所以如何调用只需要遍历的时候指定几条数据就行。目前我自己调用指定的是前8条，不会遍历或者循环就直接指定那几条就行，建议是最新的几条。

效果：

代码：

<?php
header("Content-type:text/html; charset=UTF-8");

if(!file_exists("data/".date('Y-m-d').".txt")){
    $tj_url = "https://www.baidu.com";
    $ddd = Curl($tj_url);
    preg_match_all('/<textarea id="hotsearch_data".*?>(.*?)</textarea>/',$ddd, $matches);
    // 数据处理，只保留需要的内容
    $js = json_decode($matches[1][0],true);
    $erwei_array = $js['hotsearch'];
    foreach ($erwei_array as $k => $v) {
        $new_array[] =[
            'name' => $v['pure_title'],
            'url' => urldecode($v['linkurl']),
            'rudu' => $v['heat_score'],
        ];
   }
    $aaa = json_encode($new_array,JSON_UNESCAPED_SLASHES|JSON_UNESCAPED_UNICODE);
    echo '<pre>';
    print_r($aaa);
    
    // 写入数据到txt文件
    $filename="data/".date('Y-m-d').".txt";
    $handle=fopen($filename,"w");
    $str=fwrite($handle,$aaa);
    fclose($handle); 
}else{
    $filename="data/".date('Y-m-d').".txt";
    $handle = fopen($filename, "r");
    $gagaga = fread($handle,filesize ($filename));
    // $aaa = json_encode($new_array,JSON_UNESCAPED_SLASHES|JSON_UNESCAPED_UNICODE);
    die($gagaga);
}


//GET模式的curl方法
function Curl($url){
    $UserAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36';#设置ua
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_USERAGENT, $UserAgent);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    // if ($method == "post") {
    //     curl_setopt($curl, CURLOPT_REFERER, $ifurl); 
    //     curl_setopt($curl, CURLOPT_POST, 1);
    //     curl_setopt($curl, CURLOPT_POSTFIELDS, $post_data);
    // }
    $response = curl_exec($curl);
    curl_close($curl);
    return $response;
}

正文结束