정보보안 관련 정보공유

티스토리 뷰

[Python]

wiki에서 데이터를 검색해서 MySQL DB에 저장하는 코드(무한루프)

carmoon 2021. 5. 16. 18:05

### wiki에서 데이터를 검색해서 MySQL DB에 저장하는 코드(무한루프)
from urllib.request import urlopen
from bs4 import BeautifulSoup
import datetime
import random
import pymysql
import re

conn = pymysql.connect(host='127.0.0.1', user='root', passwd='your password', db='mysql', charset='utf8')
cur = conn.cursor()
cur.execute("USE scraping")

random.seed(datetime.datetime.now())

def store(title, content):
cur.execute(
'INSERT INTO pages (title, content) VALUES ("%s", "%s")',
(title, content)
)
cur.connection.commit()

def getLinks(articleUrl):
html = urlopen('http://en.wikipedia.org'+articleUrl)
bs = BeautifulSoup(html, 'html.parser')
title = bs.find('h1').get_text()
content = bs.find('div', {'id':'mw-content-text'}).find('p').get_text()
store(title, content)
return bs.find('div', {'id':'bodyContent'}).findAll('a', href=re.compile('^(/wiki/)((?!:).)*$'))

links = getLinks('/wiki/Kevin_Bacon')
try:
while len(links) > 0:
newArticle = links[random.randint(0, len(links)-1)].attrs['href']
print(newArticle)
links = getLinks(newArticle)
finally:
cur.close()
conn.close()

저작자표시 비영리 변경금지 (새창열림)

'[Python]' 카테고리의 다른 글

[오류해결] django-admin.py : 'django-admin.py' 용어가 cmdlet, 함수, 스크립트 파일 또는 실행할 수 있는 프로그램 이름으로 인식되지 않습니다 (0)	2022.08.25
[Python] 파이썬으로 MySQL DB 연결 후 데이터 출력 - MySQL 설치부터 (0)	2021.05.16
[Python] 파이썬으로 MySQL DB 연결 후 데이터 출력 (0)	2021.05.16
[Python] 파이썬으로 csv파일 생성 후 데이터 저장하기 (0)	2021.05.16
[Python] pytesseract를 사용하여, 이미지의 text 출력 (0)	2021.05.15

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2026/07 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

Zero Security

티스토리 뷰

wiki에서 데이터를 검색해서 MySQL DB에 저장하는 코드(무한루프)

'[Python]' 카테고리의 다른 글

티스토리툴바