使用Selenium 在服务器上对网页截图

1. 需求场景

在服务器上实现对特定网页截图

2. 使用工具

  1. selenium
  2. xvfb

3. 开发流程

1. Python脚本对网页截图功能的实现

from selenium import webdriver    

url = raw_input("Please input URL:")    
save_fn = raw_input("Please input filename:")    

browser = webdriver.Firefox()    
browser.set_window_size(1200, 900)    
browser.get(url)    
browser.save_screenshot(save_fn)    
browser.close()    

2. 服务器端没有Xwindow如何截图

xvfb工具相当于一个wrapper, 给应用程序提供虚拟的 X server
Xvfb 可以直接处理 Window 的图形化功能,並且不會把图像输出到屏幕上,也就是说,就算你的电脑沒有启动 Xwindow , 你仍然可以执行图形程序。

sudo apt-get install xvfb    
export DISPLAY=:10    
xvfb-run python capture.py    

3. 浏览器的选择

PhantomJs、Firefox、Chrome

apt-get install phantomjs

driver = webdriver.PhantomJS()    

为什么不用PhantomJs: 动态加载页面效果不好
chrome 使用前需要先下载一个chromedriver, 并在使用时指定该文件

browser = webdriver.Chrome(executable_path='/usr/lib/chromium-browser/chromedriver', chrome_options=chrome_options)    

File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.py", line 99, in _wait_until_connectable    
    "The browser appears to have exited "    
selenium.common.exceptions.WebDriverException: Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary constructor, check it for details.    

解决办法:
升级selenium
安装firefox旧版本
https://www.liberiangeek.net/2012/04/how-to-install-previous-versions-of-firefox-in-ubuntu-12-04-precise-pangolin/

4. 截图时遇到的一些问题:

browser.execute_script("""    
            (function () {    
                var y = 0;    
                var step = 100;    
                window.scroll(0, 0);    

                function f() {    
                    if (y < document.body.scrollHeight) {    
                        y += step;    
                        window.scroll(0, y);    
                        setTimeout(f, 50);    
                    } else {    
                        window.scroll(0, 0);    
                        document.title += "scroll-done";    
                    }    
                }    

                setTimeout(f, 1000);    
            })();    
        """)    
from selenium.common.exceptions import NoAlertPresentException  # 2.16版本以上    
browser.get(url)   # Load page    
alert = browser.switch_to.alert    
try:    
    alert.dismiss()    
except NoAlertPresentException:    
    pass    
def timeout(seconds, error_message="Timeout Error: the cmd 30s have not finished."):    
    def decorated(func):    
        result = ""    

        def _handle_timeout(signum, frame):    
            global result    
            result = error_message    
            raise TypeError(error_message)    

        def wrapper(*args, **kwargs):    
            global result    
            signal.signal(signal.SIGALRM, _handle_timeout)    
            signal.alarm(seconds)    

            try:    
                result = func(*args, **kwargs)    
            finally:    
                signal.alarm(0)    
                return result    
            return result    

        return functools.wraps(func)(wrapper)    

    return decorated    

 @timeout(3*60, '超时退出')    
 def capture(url, save_fn="capture.png"):    
     ...    
profile = webdriver.FirefoxProfile()    
profile.set_preference('network.proxy.type', 1)    
# profile.set_preference("network.proxy.http", "127.0.0.1")    
# profile.set_preference("network.proxy.http_port", 1080)    
# profile.set_preference("network.proxy.ssl", "127.0.0.1")    
# profile.set_preference("network.proxy.ssl_port", 1080)    
profile.set_preference('network.proxy.socks', "127.0.0.1")    
profile.set_preference('network.proxy.socks_port', 1080)    
profile.update_preferences()    

browser = webdriver.Firefox(firefox_profile=profile)    
proxy = '127.0.0.1:1080'    
chrome_options = webdriver.ChromeOptions()    
chrome_options.add_argument('--proxy-server=socks5://%s' % proxy)    

browser = webdriver.Chrome(executable_path='/usr/lib/chromium-browser/chromedriver', chrome_options=chrome_options)    
from selenium.webdriver.common.by import By    
browser.find_elements(By.XPATH, xpath)[0].click()    
time.sleep(10)    
for handle in browser.window_handles:    
    browser.switch_to.window(handle)    
browser.save_screenshot(save_fn)    
browser.close()    
import os    
def delete_file_folder(src):    
    '''delete files and folders'''    
    if os.path.isfile(src):    
        try:    
            os.remove(src)    
        except:    
            pass    
    elif os.path.isdir(src):    
        for item in os.listdir(src):    
            itemsrc=os.path.join(src,item)    
            delete_file_folder(itemsrc)    
        try:    
            os.rmdir(src)    
        except:    
            pass    

delete_file_folder('/tmp')    

参考文章

  1. firefox 代理设置
  2. How to Install Previous Versions of Firefox