Tip-Review-Algorithm 第四期

发表于 2020-12-20 更新于 2024-06-06 分类于打卡记录

1. 用Cookie进行数据抓取. 2. Apple SwiftUI 教程. 3. 最长连续序列.

Tip

BinaryCookieReader助力Python请求需要Cookie的网页

最近在用Python写脚本，用来优化工作流程，比如需要转换账号、抓取列表、逐个下载解压、重命名等，这些都可以写个脚本来自动处理，但是有一点卡住的地方就是，网页都需要账号登陆验证，直接用request请求会报Authentication Failed。调研了下，发现有个Satishb3写了个BinaryCookieReader，可以用来读取Safari和iOS应用以二进制文件存储的Cookie，这样就可以在request的时候带上Cookie。当然，需要先用Safari登陆过有cookie才行，不过这是小事。

优点：网上一些模拟登陆的爬虫用到的Cookie的使用方式，需要自己浏览器找cookie复制粘贴到python脚本里，颇为麻烦。这个就需要一步操作，没有cookie缓存也只需要按照提示打开登陆一遍即可。

# import package
import sys, os
from struct import unpack
try:
    from StringIO import StringIO ## for Python 2
except ImportError:
    from io import StringIO ## for Python 3
from time import strftime, gmtime
from os.path import expanduser
import requests

# Safari Cookie路径
home = expanduser("~")
FilePath= home + '/Library/Cookies/Cookies.binarycookies'
cookieDict = dict()

try:
	binary_file=open(FilePath,'rb')
except IOError as e:
	print('File Not Found :'+ FilePath)
	print('请前往「系统偏好设置」->「安全性与隐私」->「完全磁盘访问权限」，勾选当前的Terminal，以授权访问Cookie文件')
	sys.exit(0)
   
# BinaryCookieReader读取cookie
file_header=binary_file.read(4)                             #File Magic String:cook 

if str(file_header)!='cook':
	print("Not a Cookies.binarycookie file")
	sys.exit(0)
	
num_pages=unpack('>i',binary_file.read(4))[0]               #Number of pages in the binary file: 4 bytes

page_sizes=[]
for np in range(num_pages):
	page_sizes.append(unpack('>i',binary_file.read(4))[0])  #Each page size: 4 bytes*number of pages
	
pages=[]
for ps in page_sizes:
	pages.append(binary_file.read(ps))                      #Grab individual pages and each page will contain >= one cookie



for page in pages:
	page=StringIO(page)                                     #Converts the string to a file. So that we can use read/write operations easily.
	page.read(4)                                            #page header: 4 bytes: Always 00000100
	num_cookies=unpack('<i',page.read(4))[0]                #Number of cookies in each page, first 4 bytes after the page header in every page.
	
	cookie_offsets=[]
	for nc in range(num_cookies):
		cookie_offsets.append(unpack('<i',page.read(4))[0]) #Every page contains >= one cookie. Fetch cookie starting point from page starting byte

	page.read(4)                                            #end of page header: Always 00000000

	cookie=''
	for offset in cookie_offsets:
		page.seek(offset)                                   #Move the page pointer to the cookie starting point
		cookiesize=unpack('<i',page.read(4))[0]             #fetch cookie size
		cookie=StringIO(page.read(cookiesize))              #read the complete cookie 
		
		cookie.read(4)                                      #unknown
		
		flags=unpack('<i',cookie.read(4))[0]                #Cookie flags:  1=secure, 4=httponly, 5=secure+httponly
		cookie_flags=''
		if flags==0:
			cookie_flags=''
		elif flags==1:
			cookie_flags='Secure'
		elif flags==4:
			cookie_flags='HttpOnly'
		elif flags==5:
			cookie_flags='Secure; HttpOnly'
		else:
			cookie_flags='Unknown'
			
		cookie.read(4)                                      #unknown
		
		urloffset=unpack('<i',cookie.read(4))[0]            #cookie domain offset from cookie starting point
		nameoffset=unpack('<i',cookie.read(4))[0]           #cookie name offset from cookie starting point
		pathoffset=unpack('<i',cookie.read(4))[0]           #cookie path offset from cookie starting point
		valueoffset=unpack('<i',cookie.read(4))[0]          #cookie value offset from cookie starting point
		
		endofcookie=cookie.read(8)                          #end of cookie
		                        
		expiry_date_epoch= unpack('<d',cookie.read(8))[0]+978307200          #Expiry date is in Mac epoch format: Starts from 1/Jan/2001
		expiry_date=strftime("%a, %d %b %Y ",gmtime(expiry_date_epoch))[:-1] #978307200 is unix epoch of  1/Jan/2001 //[:-1] strips the last space
				
		create_date_epoch=unpack('<d',cookie.read(8))[0]+978307200           #Cookies creation time
		create_date=strftime("%a, %d %b %Y ",gmtime(create_date_epoch))[:-1]
		#print create_date
		
		cookie.seek(urloffset-4)                            #fetch domaain value from url offset
		url=''
		u=cookie.read(1)
		while unpack('<b',u)[0]!=0:
			url=url+str(u)
			u=cookie.read(1)
				
		cookie.seek(nameoffset-4)                           #fetch cookie name from name offset
		name=''
		n=cookie.read(1)
		while unpack('<b',n)[0]!=0:
			name=name+str(n)
			n=cookie.read(1)
				
		cookie.seek(pathoffset-4)                          #fetch cookie path from path offset
		path=''
		pa=cookie.read(1)
		while unpack('<b',pa)[0]!=0:
			path=path+str(pa)
			pa=cookie.read(1)
				
		cookie.seek(valueoffset-4)                         #fetch cookie value from value offset
		value=''
		va=cookie.read(1)
		while unpack('<b',va)[0]!=0:
			value=value+str(va)
			va=cookie.read(1)
		
    # 过滤需要的网址的Cookie
		if url==".example.com" : 
			# print 'Cookie : '+name+'='+value+'; domain='+url+'; path='+path+'; '+'expires='+expiry_date+'; '+cookie_flags
			cookieDict[name] = value

binary_file.close()

# 打印Cookie
print("Cookie信息为:")
print(cookieDict)
# 没有的话可以提示用Safari打开并登陆下，以保存Cookie
if not cookieDict:
	print('没有www.example.com的cookie信息')
	print('请使用Safari打开 https://www.example.com ，刷新一下Cookie')
	os.system('open -a Safari https://www.example.com/')
	sys.exit(0)

# 使用Cookie发起请求
resp = requests.get(example_url, cookies = cookieDict)
# 接下来处理response即可...........

Review

Develop Apps with SwiftUI

Apple出的SwiftUI构建App教程，共分为8个章节，耗时约4Hr，当然学的时候肯定不止。教程引导创建一个名字叫「Scrumdinger」的app，用来管理Scrum。

SwiftUI 教程

八个部分的内容分别为：

为App创建一个自适应用户界面，添加和修改视图，然后改进app的可访问性accessibility；
为App构建视图层次结构的同时，了解常见的语言模式和Core Framework；
为App建立一个导航层次结构，呈现一个自己导航堆栈的modal view；
传递数据，使用Binding来保持视图之间的数据同步；
状态管理，让app动态响应变化，创建一个鲁棒的scrum模型，并在app中使用它；
持久化数据；
绘图：绘制2D形状以创建显示会议进度的进度环；
录音：在请求访问用户的硬件或敏感数据时遵用用户的隐私。录制音频并将其转录为文本。

这个教程手把手教，还是官方出品，比较nice。

其实还有一份SwiftUI教程，Introducing SwiftUI ，也比较推荐，已经放入清单里了。

Algorithm

最长连续序列

描述：

给定一个未排序的整数数组 nums ，找出数字连续的最长序列（不要求序列元素在原数组中连续）的长度。

进阶：你可以设计并实现时间复杂度为 O(n) 的解决方案吗？

思路：

空间换时间。构建哈希表进行统计，再遍历最大区间。

/*
 * @lc app=leetcode.cn id=128 lang=cpp
 *
 * [128] 最长连续序列
 */

// @lc code=start
class Solution {
public:
    int longestConsecutive(vector<int>& nums) {
        unordered_set<int> hashSet;
        for(int num : nums) {
            hashSet.insert(num);
        }

        int max_len = 0;

        for(int num : hashSet) {
            if (!hashSet.count(num-1))	// 只摘取左边界进行处理
            {
                int max_num = num;
                while (hashSet.count(max_num))
                {
                    max_num++;
                }
                max_len = max(max_len, max_num-num);
            }
        }
        return max_len;
    }
};
// @lc code=end