The Relationship between news and stocks 3
2022. 7. 16. 02:18ㆍProject/뉴스기사로 인한 주가 등락 예측
728x90
반응형
회의록
- CV 와 NLP 주제 고민
- CV로 주제 선정시 Style GAN의 필요 성능이 너무 높아 불가능 할거라고 판단
- NLP 주제인 뉴스기사로 인한 주가 등락 예측 주제로 선정
- 네이버 주식 사이트에서 KOSPI 200 기업의 주식시세와 관련 뉴스 데이터 수집
코드
클래스 생성
class Crolling:
def __init__(self):
self.conn = pymysql.connect(user = 'stocks',
passwd = '',
host = "",
port = 3306,
db = 'Data',
charset = 'utf8')
self.cur = self.conn.cursor()
self.driver = webdriver.Chrome(ChromeDriverManager().install())
self.driver.implicitly_wait(3)
self.driver.maximize_window()
KOSPI 200 기업 고유 아이디 수집
def Stock_ID(self):
for i in range(1, 21):
url = f'https://finance.naver.com/sise/entryJongmok.naver?&page={i}'
self.driver.get(url)
self.driver.implicitly_wait(5)
html = self.driver.page_source
soup = BeautifulSoup(html, 'html.parser')
for j in range(3, 13):
id = [i['href'].split("=")[1] for i in soup.select(f'body > div > table.type_1 > tbody > tr:nth-child({j}) > td.ctg > a')]
name = [i.text for i in soup.select(f'body > div > table.type_1 > tbody > tr:nth-child({j}) > td.ctg > a')]
sql = (id, name)
self.cur.execute('INSERT IGNORE INTO Stock_ID (id, name) VALUES (%s ,%s)', sql)
self.conn.commit()
print(f'{i}/20 페이지 완료')
self.driver.quit()
주식 시세 수집 ( 3월 31 ~ 7월 18일 )
def Stock_Price(self):
self.cur.execute('SELECT id FROM Stock_ID;')
stock_id = self.cur.fetchall()
for id in stock_id:
for i in range(1, 21):
url = f'https://finance.naver.com/item/sise_day.naver?code={id[0]}&page={i}'
self.driver.get(url)
self.driver.implicitly_wait(5)
html = self.driver.page_source
soup = BeautifulSoup(html, 'html.parser')
for j in [3,4,5,6,7,11,12,13,14,15]:
date = soup.select_one(f'body > table.type2 > tbody > tr:nth-child({j}) > td:nth-child(1) > span').text
closing_price = soup.select_one(f'body > table.type2 > tbody > tr:nth-child({j}) > td:nth-child(2) > span').text.replace(',','')
market_price = soup.select_one(f'body > table.type2 > tbody > tr:nth-child({j}) > td:nth-child(4) > span').text.replace(',','')
high_price = soup.select_one(f'body > table.type2 > tbody > tr:nth-child({j}) > td:nth-child(5) > span').text.replace(',','')
low_price = soup.select_one(f'body > table.type2 > tbody > tr:nth-child({j}) > td:nth-child(6) > span').text.replace(',','')
sql = (id[0], date, closing_price, market_price, high_price, low_price)
self.cur.execute('INSERT IGNORE INTO Stock_Price (stock_id , date, closing_price, market_price, high_price, low_price) VALUES (%s ,%s ,%s ,%s ,%s ,%s)', sql)
self.conn.commit()
break
else:
continue
if date == '2022.03.31':
break
self.conn.close()
self.driver.quit()
728x90
반응형
'Project > 뉴스기사로 인한 주가 등락 예측' 카테고리의 다른 글
The Relationship between news and stocks 6 (0) | 2022.07.22 |
---|---|
The Relationship between news and stocks 5 (0) | 2022.07.22 |
The Relationship between news and stocks 4 (0) | 2022.07.19 |
The Relationship between news and stocks 2 (0) | 2022.07.15 |
The Relationship between news and stocks 1 (0) | 2022.07.12 |