<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>DataCook</title>
    <link>https://datacook.tistory.com/</link>
    <description>Email : lyt970120@gmail.com
</description>
    <language>ko</language>
    <pubDate>Tue, 9 Jun 2026 06:52:19 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>Joon09</managingEditor>
    <image>
      <title>DataCook</title>
      <url>https://tistory1.daumcdn.net/tistory/4187085/attach/6f66645f1b1f4c84af9cf61b09b8712d</url>
      <link>https://datacook.tistory.com</link>
    </image>
    <item>
      <title>Test-Time Scaling(TTS) Methods for Reasoning LLMs</title>
      <link>https://datacook.tistory.com/137</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;Test-Time Scaling(TTS)&lt;/b&gt;은 훈련된 언어 모델의 구조를 바꾸지 않고도&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;추론 성능을 향상&lt;/b&gt;시키는 기술로, 추론 중 필요한 연산량을 동적으로 조절하여 정확도와 효율성 간의 균형을 맞춥니다. 특히, reasoning이 중요한 수학, 계획, 논리 기반 문제에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;훈련 비용 없이&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;성능을 극대화할 수 있어 매우 주목받고 있습니다.&lt;/p&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  TTS의 개요&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;default&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;항목설명
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;760&quot; data-table-local-id=&quot;51300749-90d0-4f08-af4e-e4f718d27e70&quot; data-autosize=&quot;false&quot; data-layout=&quot;default&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;항목&lt;/td&gt;
&lt;td&gt;설명&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;245&quot;&gt;&lt;span&gt;&lt;b&gt;목적&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;514&quot;&gt;&lt;span&gt;추론 성능을 개선하면서도 모델 구조나 파라미터는 유지&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;245&quot;&gt;&lt;span&gt;&lt;b&gt;적용 시점&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;514&quot;&gt;&lt;span&gt;모델 실행 시점 (inference-time)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;245&quot;&gt;&lt;span&gt;&lt;b&gt;핵심 전략&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;514&quot;&gt;&lt;span&gt;다중 샘플링, 탐색 기반 디코딩, 응답 재정렬 및 수정&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;245&quot;&gt;&lt;span&gt;&lt;b&gt;장점&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;514&quot;&gt;&lt;span&gt;학습 없이 성능 향상 가능, 저비용&amp;middot;고정밀 응용에 적합&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;  Test-Time Scaling (TTS) Methods &amp;ndash; 통합 요약표&lt;/h2&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 746px;&quot; border=&quot;1&quot; data-table-width=&quot;1182&quot; data-table-local-id=&quot;e8fc6593-0fa7-4234-8026-6317e1b09f97&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;분류&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;대표 기법&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;핵심 아이디어 &amp;amp; 설명&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;특징&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Sampling 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Best-of-N SamplingConfidence-based Sampling&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;N개의 응답을 생성한 뒤, reward 모델이나 log-prob 기반으로 가장 좋은 결과 선택- Confidence 기반은 확률값을 활용해 높은 신뢰도의 응답을 우선 선택&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;다양성 확보- 구현이 간단- compute cost는 N에 비례&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Decoding 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Beam SearchSelf-Consistency Decoding&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;Beam Search는 상위 확률 응답 N개를 유지하며 확장- Self-Consistency는 여러 CoT 응답을 생성 후 다수결 또는 평균으로 최종 응답 결정&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;고정된 탐색 전략- 복잡한 문제에서 일관성 향상&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Reasoning 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Chain-of-Thought (CoT)Tree-of-Thought (ToT)Graph-of-Thought (GoT)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;CoT는 &amp;ldquo;생각의 흐름&amp;rdquo;을 명시적으로 유도- ToT는 추론을 트리로 구성해 다양한 분기 탐색- GoT는 그래프 구조로 유연하게 추론 연결&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;복잡한 문제 해결에 강함- CoT는 간단, ToT/GoT는 탐색력 &amp;uarr;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Search &amp;amp; Verification 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Search Against VerifiersMonte Carlo Tree Search (MCTS)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;다수 응답을 생성한 후 verifier(보상모델)로 평가- MCTS는 rollout을 통해 탐색 경로를 확장 및 평가&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;정답 검증에 유리- compute cost &amp;uarr;, 정확도도 &amp;uarr;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Self-Improvement 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Sequential RevisionSelf-RefinementChain-of-Action-Thought&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;모델이 응답 &amp;rarr; 비판 &amp;rarr; 수정의 반복적 과정을 수행- CoT 기반 추론을 자체 feedback loop로 개선&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 113px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;학습 없이도 iterative 개선- 복잡한 reasoning에서 강력&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Compute 최적화 기반&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Compute-Optimal Scaling (COS)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;입력의 난이도를 자동 예측하여 compute 전략(샘플링, beam, revision)을 다르게 적용- 쉬운 문제는 sequential, 어려운 문제는 병렬 탐색&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;td style=&quot;height: 138px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;compute 효율 극대화- 성능은 유지하면서 4&amp;times; 연산 절감&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;  보충 설명&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Best-of-N vs. Beam Search&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;Best-of-N은 diversity 중심, Beam Search는 확률 중심&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;Easy tasks &amp;rarr; Best-of-N, Hard tasks &amp;rarr; Beam Search가 효과적&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;ToT와 GoT의 차이&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;ToT: 트리 기반, 순차적 탐색&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;GoT: 그래프 기반, 이전 응답 간 연결과 통합 가능 (self-refinement 용이)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Self-Refinement&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;모델이 직접 &quot;비판 + 개선&quot;하는 과정 &amp;rarr; CoT를 정제하는 방식&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;수학, 코드, 고난도 QA에서 탁월한 효과&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;COS (Compute-Optimal Scaling)&lt;/b&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;테스트 시점에서 문제 난이도에 따라 리소스를 &quot;스마트하게&quot; 조절&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;GPT 계열의 실제 적용 사례 존재 (성능 대비 연산량 4배 절감)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  3. Pretraining vs. Test-Time Scaling&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;926&quot; data-table-local-id=&quot;d382f117-8385-47d5-a91e-4ab7fff5f7c0&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;구분&lt;/td&gt;
&lt;td&gt;사전학습&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;(Pretraining)&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;TTS&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;(Test-Time Scaling)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;목적&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;모델 자체의 능력 확장&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;추론 시점의 성능 향상&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;비용 구조&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;초기 비용 높음&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;유연한 추론 비용 할당&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;변경 여부&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;모델 파라미터 수정&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;파라미터 유지&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;강점&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;새로운 능력 획득&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;응답의 품질 및 정확도 향상&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;단점&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;재훈련 필요, 비용 큼&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;느릴 수 있음, 실시간 최적화 필요&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;대표 사례&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;GPT-4, LLaMA 학습&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT, Beam Search, MCTS 등&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  TTS의 전략적 가치&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;924&quot; data-table-local-id=&quot;1524fc17-4e5a-4225-8237-cd0c8754d253&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span&gt;&lt;b&gt;관점&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span&gt;설명&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;성능 대비 비용&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;소형 모델도 TTS 적용 시, 대형 모델에 맞먹는 성능 가능 (최대 14배 모델 크기 상쇄)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;추론 최적화&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;문제 난이도에 따라 연산량을 조절 (Compute-Optimal Scaling)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;기존 방식과의 차이&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;학습 기반 강화(RL, SFT) 없이도 inference 중 최적 경로를 동적으로 선택&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;응용 예시&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;고정된 compute 환경, 모바일 기기, 다중 시도 제한 환경&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;✅ 결론 및 활용 관점&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;TTS는 사전학습 모델을 활용하여 &amp;ldquo;지금 이 순간&amp;rdquo;에 더 잘 추론하게 만드는 기술&lt;/b&gt;입니다.&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;특히&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;작은 모델로 큰 성능을 구현하고 싶을 때&lt;/b&gt;, 혹은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;재훈련 없이 특정 문제에 특화된 응답을 이끌고 싶을 때&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;매우 유용합니다.&lt;/li&gt;
&lt;/ul&gt;</description>
      <category>AI와 함께</category>
      <category>llm tuning</category>
      <category>test-time scaling</category>
      <category>TTS</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/137</guid>
      <comments>https://datacook.tistory.com/137#entry137comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:16:15 +0900</pubDate>
    </item>
    <item>
      <title>Reinforced LLMs: 강화학습을 통한 LLM 최적화</title>
      <link>https://datacook.tistory.com/136</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;강화학습(Reinforcement Learning, RL)은 대형 언어 모델(LLM)을 사용자 선호에 맞게 정렬(alignment)시키고, 복잡한 추론 작업에 최적화하는 핵심 방법 중 하나입니다. 특히, 사후 학습(post-training) 단계에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Supervised Fine-Tuning(SFT) &amp;rarr; Reward Modeling &amp;rarr; Policy Optimization&lt;/b&gt;의 3단계 구조를 기반으로 수행됩니다.&lt;/p&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  1. 강화 학습 기반 LLM 최적화 개요&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1034&quot; data-table-local-id=&quot;9e52c10d-9715-4bee-9427-16f5d5394906&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;단계&lt;/td&gt;
&lt;td&gt;설명&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;SFT&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;고품질 라벨 데이터를 기반으로 기본 응답 형식, 스타일 등을 학습&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Reward Model 학습&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;사람 또는 AI가 평가한 선호도 기반으로 응답 품질을 수치화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;RL 정책 최적화&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;PPO, DPO, GRPO 등 알고리즘으로 보상 모델에 따라 응답 정책 조정&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  2. 보상 모델링 (Reward Modeling)&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1030&quot; data-table-local-id=&quot;16c58ec4-f162-4093-aa90-afef760614f0&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;유형&lt;/td&gt;
&lt;td&gt;설명&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;명시적 보상 (Explicit)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;수작업 라벨 또는 전문가 평가 기반 정량적 신호&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;암묵적 보상 (Implicit)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;사용자 행동(클릭률, 체류 시간 등) 기반 보상 추정&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Outcome Reward Model (ORM)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;결과(정답 여부) 중심 보상&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Process Reward Model (PRM)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;추론 과정(논리적 전개) 중심 보상, 수학&amp;middot;코딩에 효과적&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Iterative Reward&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;정책 모델과 보상 모델을 반복적으로 공동 진화시킴&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  3. 대표 강화 학습 기법 요약&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1034&quot; data-table-local-id=&quot;e91f05c7-0833-41db-9648-bbdf3546a00c&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;기법&lt;/td&gt;
&lt;td&gt;개요&lt;/td&gt;
&lt;td&gt;특징&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;PPO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;정책의 KL-divergence를 제한하며 안정적 학습&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;RLHF에서 표준&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;TRPO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;제한된 정책 변화 내에서 최적화&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;이론적 안정성 우수, 계산 복잡도 큼&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;DPO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;보상 없이 직접 선호 데이터로 정책 학습&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;log-likelihood 기반 직접 최적화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;GRPO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;그룹 내 응답 평균 기반 이점(advantage) 계산&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Value function 없이 효율적 학습&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;OREO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Bellman 방정식 기반 오프라인 RL&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;수학, 계획 문제에 특화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;ORPO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;선호 응답 대비 비선호 응답의 확률 비율 최적화&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;간단한 구현, 다중 신호 결합 어려움&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;RLAIF&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;보상 모델을 AI가 생성&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;인적 비용 절감, 대규모 학습 용이&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  4. 실전 적용 사례: DeepSeek-R1&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;DeepSeek-R1&lt;/b&gt;은 RL 기반 LLM의 대표 사례로, 다음과 같은 파이프라인을 따릅니다:&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;orderedList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Cold Start RL:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;CoT 형태의 초기 학습 데이터로 RL 시작 안정화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Rejection Sampling + SFT:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;고품질 응답을 선별 &amp;rarr; Supervised 데이터로 재사용&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Reasoning-oriented RL:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;수학, 논리 등 다단계 추론 강화를 위한 PRM 활용&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;2단계 RL:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;친절함, 안전성 등 정렬성 향상 위한 추가 보상 적용&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Distillation:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;대형 모델의 성능을 작은 모델에 이전 (Qwen, LLaMA 계열 등)&lt;/li&gt;
&lt;/ol&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  5. RL 기법 비교표&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;110a9661-e08d-4c68-ab71-0d1e765973c4&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;항목&lt;/td&gt;
&lt;td&gt;PPO&lt;/td&gt;
&lt;td&gt;DPO&lt;/td&gt;
&lt;td&gt;GRPO&lt;/td&gt;
&lt;td&gt;TRPO&lt;/td&gt;
&lt;td&gt;&lt;span&gt;RLAIF&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;OREO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;학습 방식&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;보상 모델 기반 RL&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;직접 선호 최적화&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;그룹 기반 이점 추정&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;신뢰영역 제한 최적화&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;AI 피드백 기반&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Bellman 기반 오프라인 학습&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;보상 필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;불필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;복잡도&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;중&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;낮음&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;낮음&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;높음&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;낮음&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;높음&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Value Function&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;불필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;불필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;불필요&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;필요&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;특징&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;안정성, RLHF 핵심&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;간단하고 효과적&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;메모리 효율성&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;이론적 정밀성&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;비용 절감&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;수학/계획 문제에 특화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;✅ 결론&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;RL 기반 최적화는 LLM을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사용자 지향, 문맥 정렬, 안전성 중심&lt;/b&gt;으로 진화시키는 핵심 기법입니다.&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;최신 트렌드는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;보상 모델 생략 또는 간소화(DPO, GRPO)&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;AI 기반 보상 생성(RLAIF)&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;고급 reasoning 강화(OREO, GRPO)&lt;/b&gt;로 나아가고 있습니다.&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;특히 RL + Distillation 전략은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;고성능 + 경량 모델 확보&lt;/b&gt;라는 두 마리 토끼를 잡을 수 있어 실무에서 각광받고 있습니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음 글&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/137&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.03.27 - [AI와 함께] - Test-Time Scaling(TTS) Methods for Reasoning LLMs&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1743041994632&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;Test-Time Scaling(TTS) Methods for Reasoning LLMs&quot; data-og-description=&quot;Test-Time Scaling(TTS)은 훈련된 언어 모델의 구조를 바꾸지 않고도&amp;nbsp;추론 성능을 향상시키는 기술로, 추론 중 필요한 연산량을 동적으로 조절하여 정확도와 효율성 간의 균형을 맞춥니다. 특히, reason&quot; data-og-host=&quot;datacook.tistory.com&quot; data-og-source-url=&quot;https://datacook.tistory.com/137&quot; data-og-url=&quot;https://datacook.tistory.com/137&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/h9aXk/hyYurFAsQI/BEMQ9FJgs8ByiJJEmeKvKK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/IOMsK/hyYvsjJUMU/Qdc72GJKqaCeje0cigyOxK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/137&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://datacook.tistory.com/137&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/h9aXk/hyYurFAsQI/BEMQ9FJgs8ByiJJEmeKvKK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/IOMsK/hyYvsjJUMU/Qdc72GJKqaCeje0cigyOxK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Test-Time Scaling(TTS) Methods for Reasoning LLMs&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Test-Time Scaling(TTS)은 훈련된 언어 모델의 구조를 바꾸지 않고도&amp;nbsp;추론 성능을 향상시키는 기술로, 추론 중 필요한 연산량을 동적으로 조절하여 정확도와 효율성 간의 균형을 맞춥니다. 특히, reason&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;datacook.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>llm tuning</category>
      <category>llm 강화학습</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/136</guid>
      <comments>https://datacook.tistory.com/136#entry136comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:14:55 +0900</pubDate>
    </item>
    <item>
      <title>Supervised Finetuning in Large Language Models (LLMs)</title>
      <link>https://datacook.tistory.com/135</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;Supervised Finetuning(SFT)은 대형 언어 모델(LLM)의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사후 학습(post-training)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;과정에서 가장 기본이 되는 구성 요소로,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사람이 라벨링한 데이터&lt;/b&gt;를 이용하여 모델을 특정 목적에 맞게 조정하는 기법입니다. 아래는 SFT의 주요 형태와 기술별 설명입니다.&lt;/p&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span data-prosemirror-mark-name=&quot;backgroundColor&quot; data-prosemirror-content-type=&quot;mark&quot; data-background-custom-color=&quot;#fdd0ec&quot;&gt;  1. Instruction Finetuning&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;프롬프트(지시)와 응답(완성)의 쌍으로 구성된 대규모 데이터셋을 기반으로, 사용자의 명령을 정확하고 유용하게 따르도록 훈련.&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;주요 효과:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;다양한 작업에 대해 zero-shot 또는 few-shot 설정에서도 높은 성능 발휘&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례 모델:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;T0, FLAN, Alpaca, Vicuna, Dolly&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  2. Dialogue (Multi-turn) Finetuning&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;연속적인 대화 형태(사용자 &amp;harr; 시스템)를 학습하여,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;문맥 유지 능력&lt;/b&gt;과&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;대화 자연스러움&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;향상&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;차이점:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;Instruction tuning이 단일 프롬프트-응답인 반면, 다이얼로그 튜닝은 다중 턴의 흐름을 학습&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례 모델:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;ChatGPT, LaMDA&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  3. Chain-of-Thought (CoT) Reasoning Finetuning&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;모델이 단순 결과가 아닌&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사고의 중간 단계&lt;/b&gt;를 생성하도록 지도 &amp;rarr; 해석 가능성과 정확도 개선&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;데이터:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;수작업 또는 모델 기반으로 생성된 step-by-step reasoning 데이터 사용&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례 모델/기법:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;Chain-of-Thought Prompting, Self-Consistency, CoT Distillation, LlaVA-CoT, LlamaV-o1 (멀티모달)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span data-prosemirror-mark-name=&quot;backgroundColor&quot; data-prosemirror-content-type=&quot;mark&quot; data-background-custom-color=&quot;#fdd0ec&quot;&gt;  4. Domain-Specific Finetuning&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;특정 전문 분야(예: 의학, 금융, 법률)에 특화된 텍스트와 태깅 데이터를 활용하여 도메인 전문성 강화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;응용 분야:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;분류, 검색, QA 등&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례 모델:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;BioGPT, BiMediX (의료), FinBERT (금융), ClimatGPT (기후), CodeT5 (코딩)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span data-prosemirror-mark-name=&quot;backgroundColor&quot; data-prosemirror-content-type=&quot;mark&quot; data-background-custom-color=&quot;#fdd0ec&quot;&gt;  5. Distillation-Based Finetuning&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;큰 teacher 모델이 생성한 응답/추론 경로를 작은 student 모델이 학습하는 방식&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;장점:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;경량화된 모델을 빠르고 효율적으로 학습 가능, 성능 유지하며 비용 절감&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례 기법:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;CoT Distillation, Step-by-Step Distillation&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  6. Preference &amp;amp; Alignment Finetuning&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;개념:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;인간이 바람직한 응답/바람직하지 않은 응답을 라벨링한 데이터로 훈련 &amp;rarr;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;안전성, 유해성 제거&lt;/b&gt;&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;RLHF와의 관계:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;RL 이전 단계로, 선호 데이터 기반의 supervised 학습부터 수행&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사례:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;InstructGPT, OpenAI Alignment Process&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;&lt;span data-prosemirror-mark-name=&quot;backgroundColor&quot; data-prosemirror-content-type=&quot;mark&quot; data-background-custom-color=&quot;#fdd0ec&quot;&gt;  7. Efficient Finetuning (Parameter-Efficient Fine-Tuning, PEFT)&lt;/span&gt;&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;문제:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;대규모 모델 전체를 미세 조정하면 비용이 과도하게 발생&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;해결:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;일부 파라미터(LoRA, Prefix, Adapters 등)만 조정하고 나머지는 고정&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;조합 전략:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;PEFT + 양자화(QLoRA) + 프루닝(SparseGPT) &amp;rarr; 저가형 GPU에서도 학습 가능&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;프레임워크:&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;PEFT (HuggingFace), QLoRA, BitsAndBytes 등&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;✅ SFT 방식 비교 요약&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;&lt;br /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 162px;&quot; border=&quot;1&quot; data-table-width=&quot;964&quot; data-table-local-id=&quot;2019b851-1668-4fe1-8948-a0645cdf144f&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;유형&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;목적&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;대표 기술/사례&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Instruction&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;일반적인 작업 수행 능력&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;FLAN, Alpaca&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Dialogue&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;다중 대화 턴 처리&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;ChatGPT, LaMDA&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT Reasoning&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;단계적 추론 강화&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT, Self-Consistency&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;도메인 특화&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;전문 지식 내재화&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;BioGPT, FinBERT&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Distillation&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;경량 모델 지식 전이&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT Distillation&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Preference Alignment&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;안전하고 유익한 응답 유도&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;InstructGPT, RLHF 초기 단계&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Efficient Finetuning&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;저비용 튜닝&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;LoRA, QLoRA, PEFT&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  결론&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;Supervised Finetuning은 LLM이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;현실 문제에 맞게 정렬되고 적용될 수 있도록 만드는 핵심 기법&lt;/b&gt;이다. 최근에는 완전한 파라미터 재학습이 아닌,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;효율적인 사후 조정(PEFT)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;전략과 함께 활용되어&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;성능과 비용의 균형&lt;/b&gt;을 맞추는 방향으로 발전하고 있다. Instruction 및 Preference 기반 학습은 LLM의 안전성과 사용자 정렬성(alignment)을 높이는 데에도 필수적이다.&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;다음 글&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/136&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.03.27 - [AI와 함께] - Reinforced LLMs: 강화학습을 통한 LLM 최적화&lt;/a&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>llm tuning</category>
      <category>supervised finetuning</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/135</guid>
      <comments>https://datacook.tistory.com/135#entry135comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:13:33 +0900</pubDate>
    </item>
    <item>
      <title>Introduction: Why Post-training for LLMs Matters</title>
      <link>https://datacook.tistory.com/134</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;현대의 대형 언어 모델(LLMs)은 단순한 텍스트 생성 능력을 넘어, 복잡한&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;다단계 추론(multi-step reasoning)&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;자동화된 콘텐츠 생성&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;멀티모달 상호작용&lt;/b&gt;에 이르기까지 다양한 영역에서 탁월한 성능을 보이고 있습니다. 그러나,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사실 왜곡(hallucination)&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;비논리적인 응답&lt;/b&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사용자 가치 부정합(alignment mismatch)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;등의 한계를 내포하고 있어,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사후 학습(post-training)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;전략이 필수적입니다.&lt;/p&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;이 논문은 이러한 배경에서 LLM의 사후 학습(Post-training)을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Fine-Tuning, Reinforcement Learning, Test-Time Scaling&lt;/b&gt;이라는 세 축으로 구조화하여 고찰합니다.&lt;/p&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  1. Post-training의 필요성과 정의&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1028&quot; data-table-local-id=&quot;fd724b7d-afc0-4872-8d34-5a2718fd9c48&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span&gt;&lt;b&gt;항목&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span&gt;설명&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;313&quot;&gt;&lt;span&gt;&lt;b&gt;문제점&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;714&quot;&gt;&lt;span&gt;오류 생성(hallucination), 논리적 불일치, 비윤리적 응답 등&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;313&quot;&gt;&lt;span&gt;&lt;b&gt;원인&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;714&quot;&gt;&lt;span&gt;LLM의 &amp;lsquo;추론&amp;rsquo;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;확률 기반&lt;/b&gt;, 인간처럼 명시적 규칙 기반이 아님&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;313&quot;&gt;&lt;span&gt;&lt;b&gt;Post-training 목표&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;714&quot;&gt;&lt;span&gt;정렬성(Alignment), 정확성(Factuality), 문맥성(Context-awareness) 확보&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;313&quot;&gt;&lt;span&gt;&lt;b&gt;LLM 학습 구조&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;714&quot;&gt;&lt;span&gt;&lt;span data-prosemirror-mark-name=&quot;code&quot; data-prosemirror-content-type=&quot;mark&quot;&gt;Pre-training&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;+&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span data-prosemirror-mark-name=&quot;code&quot; data-prosemirror-content-type=&quot;mark&quot;&gt;Post-training&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;(Fine-tuning / RL / Scaling 등)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  2. 주요 Post-training 축별 설명&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1024&quot; data-table-local-id=&quot;c776704d-7470-43db-a968-d55de897ecf5&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;구분&lt;/td&gt;
&lt;td&gt;핵심&lt;/td&gt;
&lt;td&gt;전략목표 및 특징&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;310&quot;&gt;&lt;span&gt;&lt;b&gt;Fine-tuning&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;372&quot;&gt;&lt;span&gt;SFT, PEFT, Domain FT 등&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;특정 태스크에 최적화, 정밀도 &amp;uarr;, 하지만 과적합/범용성 &amp;darr;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;310&quot;&gt;&lt;span&gt;&lt;b&gt;Reinforcement Learning&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;372&quot;&gt;&lt;span&gt;RLHF, DPO, GRPO 등&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;사용자 선호와 동적 상호작용 반영, 정렬성/추론성 강화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;310&quot;&gt;&lt;span&gt;&lt;b&gt;Test-time Scaling&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;372&quot;&gt;&lt;span&gt;CoT, Tree-of-Thought, ToT + Beam Search 등&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;추론 과정에서 리소스를 동적으로 배분하여 정확도 향상&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  3. 기존 연구와의 차별점&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;항목기존 서베이의 한계본 논문의 기여
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;b7dc1014-f04e-4e58-b796-eba7610d3baa&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;범위&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;RL 중심, SFT/Scaling은 미흡&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;SFT-RL-TTS 전 영역 통합적으로 다룸&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;도구성&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;구현 지침 부족, 실무 활용 어려움&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;벤치마크, 데이터셋, 튜토리얼 제공&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;기술 스펙트럼&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;RLHF 위주, 최신 기법 미반영&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;GRPO, DPO, OREO 등 최신 기법 포함&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  4. 논문의 핵심 기여&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;LLM 사후 학습 전략을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Fine-tuning, RL, Test-Time Scaling&lt;/b&gt;으로 구조화하여 포괄적으로 리뷰&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;RL 기법(DPO, PPO, GRPO 등)을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;이론 기반 및 실제 사례&lt;/b&gt;로 통합 정리&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;실제 응용을 위한&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;벤치마크/데이터셋/구현 가이드&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;제시&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;  5. LLM Post-training 기술 분류도 (Fig. 1 기반)&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;2cddd4cb-ded1-4f88-8f69-31d2ba3357e0&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;범주&lt;/td&gt;
&lt;td&gt;주요 기술&lt;/td&gt;
&lt;td&gt;대표 모델&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Fine-tuning&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;SFT, LoRA, PEFT, Adapters&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;LLaMA 3, Falcon, Phi-4&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;RL 기반 학습&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;RLHF, DPO, GRPO, RLAIF&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;GPT-4, Claude 3, DeepSeek-R1&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Test-Time Scaling&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT, ToT, Beam Search, Confidence Sampling&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;DeepSeek-R1, Starling, Qwen2&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt; &lt;span&gt;&amp;nbsp;&lt;/span&gt;&amp;lsquo;141B-A39B&amp;rsquo; 형식은 Mixture-of-Experts 아키텍처에서 전체 파라미터 수(141B)와 활성 파라미터 수(39B)를 나타냅니다.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;✅ 결론 요약&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;LLM의 추론 한계는 단순한 데이터 기반 사전학습만으로 해결되지 않음&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사후학습(Post-training)은 실질적인 응용 가능성을 높이는 핵심 전략&lt;/b&gt;&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;Fine-tuning, Reinforcement, Scaling은 상호보완적으로 통합되어야 실질적 성능 개선 가능&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;본 논문은 기존 연구의 단절적 접근에서 벗어나&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;전주기적 구조 정리와 실전 적용성&lt;/b&gt;을 목표로 함&lt;/li&gt;
&lt;/ul&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음 글&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/135&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.03.27 - [분류 전체보기] - Supervised Finetuning in Large Language Models (LLMs)&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1743041953948&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;Supervised Finetuning in Large Language Models (LLMs)&quot; data-og-description=&quot;Supervised Finetuning(SFT)은 대형 언어 모델(LLM)의&amp;nbsp;사후 학습(post-training)&amp;nbsp;과정에서 가장 기본이 되는 구성 요소로,&amp;nbsp;사람이 라벨링한 데이터를 이용하여 모델을 특정 목적에 맞게 조정하는 기법입니&quot; data-og-host=&quot;datacook.tistory.com&quot; data-og-source-url=&quot;https://datacook.tistory.com/135&quot; data-og-url=&quot;https://datacook.tistory.com/135&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/zMwEH/hyYvtQvo53/ZxvLpcLKjcBR2sJsyeH5M0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/w4KBA/hyYyTUib5m/IfpgzG36rZX85oBh6gjPwK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/135&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://datacook.tistory.com/135&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/zMwEH/hyYvtQvo53/ZxvLpcLKjcBR2sJsyeH5M0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/w4KBA/hyYyTUib5m/IfpgzG36rZX85oBh6gjPwK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Supervised Finetuning in Large Language Models (LLMs)&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;Supervised Finetuning(SFT)은 대형 언어 모델(LLM)의&amp;nbsp;사후 학습(post-training)&amp;nbsp;과정에서 가장 기본이 되는 구성 요소로,&amp;nbsp;사람이 라벨링한 데이터를 이용하여 모델을 특정 목적에 맞게 조정하는 기법입니&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;datacook.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>llm post training</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/134</guid>
      <comments>https://datacook.tistory.com/134#entry134comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:12:32 +0900</pubDate>
    </item>
    <item>
      <title>대규모 언어 모델(LLM) 사후학습(Post-Training) 전략 개요</title>
      <link>https://datacook.tistory.com/133</link>
      <description>&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-pm-slice=&quot;0 0 []&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;AI 논문 리뷰&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://arxiv.org/html/2502.21321&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://arxiv.org/html/2502.21321&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1743041844376&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;website&quot; data-og-title=&quot;LLM Post-Training: A Deep Dive into Reasoning Large Language Models&quot; data-og-description=&quot;LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar&amp;lowast;, Tajamul Ashraf&amp;lowast;, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Fahad Shahbaz Khan, Salman Khan &amp;lowast;Equal contribu&quot; data-og-host=&quot;arxiv.org&quot; data-og-source-url=&quot;https://arxiv.org/html/2502.21321&quot; data-og-url=&quot;https://arxiv.org/html/2502.21321&quot; data-og-image=&quot;&quot;&gt;&lt;a href=&quot;https://arxiv.org/html/2502.21321&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://arxiv.org/html/2502.21321&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url();&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;LLM Post-Training: A Deep Dive into Reasoning Large Language Models&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;LLM Post-Training: A Deep Dive into Reasoning Large Language Models Komal Kumar&amp;lowast;, Tajamul Ashraf&amp;lowast;, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Fahad Shahbaz Khan, Salman Khan &amp;lowast;Equal contribu&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;arxiv.org&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;총 5가지로 파트로 나눠서 AI로 정리한 글임&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-pm-slice=&quot;0 0 []&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;&amp;nbsp;&lt;/h2&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-pm-slice=&quot;0 0 []&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;1. 왜 사후학습이 중요한가? (도입 목적)&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;대규모 언어 모델(LLMs)은 사전학습(pretraining)만으로도 다양한 언어 과제를 수행할 수 있으나, 다음과 같은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;중요한 한계점&lt;/b&gt;이 존재합니다:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;논리적 일관성 부족&lt;/b&gt;: CoT 없이 단답형 응답에 그침&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사실 오류(hallucination)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;발생 가능성&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;사용자 의도와 불일치하거나 유해한 응답 가능성&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;이에 따라, LLM의 능력을 실제 응용 환경에 맞게 조정하기 위해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;사후학습(Post-Training)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;전략이 도입됩니다. 이는 아래 세 가지 축으로 구성됩니다:&lt;/p&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt; &lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Fine-Tuning (지도 미세조정)&lt;/b&gt;&lt;br /&gt; &lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Reinforcement Learning (강화학습 기반 정렬)&lt;/b&gt;&lt;br /&gt; &lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Test-Time Scaling (추론 단계 스케일링)&lt;/b&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;2. 사후학습의 핵심 구성요소 요약&lt;/h2&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 126px;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;70389b3c-bad2-463b-aaca-3f01d0aad9de&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;&lt;b&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;구분&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;방법&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;목적&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;특징&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;지도 미세조정 (SFT)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Instruction/Dialogue/Domain-Specific/CoT/Distillation 기반&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;사용자 지시 이해, 도메인 적응&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;고정된 데이터 기반, 전통적 접근&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;강화학습 기반 정렬 (RL)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;RLHF, DPO, GRPO, RLAIF, PPO, ORPO 등&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;사람 선호 반영, 안전성 확보&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;피드백 루프, 보상모델 기반 업데이트&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;추론단계 최적화 (TTS)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Beam Search, CoT, Self-Refinement, MCTS 등&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;추론 단계에서 정확도 향상&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 36px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;학습 불필요, 연산 효율성 조정 가능&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;3. 지도 미세조정 (Fine-Tuning)&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;지도 미세조정은 다음과 같이 다양하게 적용됩니다:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Instruction Tuning&lt;/b&gt;: 명령-응답 쌍 학습 (예: FLAN, Alpaca)&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Dialogue Tuning&lt;/b&gt;: 다중턴 대화 유지 (예: ChatGPT)&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;CoT Tuning&lt;/b&gt;: 논리 추론 과정을 명시적 학습&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Domain-Specific Tuning&lt;/b&gt;: 바이오/금융/법률 등 전문 데이터로 특화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Distillation&lt;/b&gt;: 큰 모델의 추론 출력을 작은 모델로 전이학습&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;PEFT&lt;/b&gt;: LoRA, QLoRA 등 파라미터 효율화 기법 활용&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;✔️ 소규모 GPU에서도 미세조정 가능 (e.g. QLoRA + BitsAndBytes)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;4. 강화학습 기반 정렬 (Reinforced LLMs)&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;LLM을 인간 선호나 평가 기준에 맞추기 위한 대표적인 접근법은 다음과 같습니다:&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;orderedList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;SFT (지도학습)&lt;/b&gt;: 고품질 응답 예제로 초기화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;보상모델(RM)&lt;/b&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;학습: 인간 선호/순위 기반 예측 모델&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;정책 최적화 (PPO/DPO 등)&lt;/b&gt;: 보상을 최대화하는 방향으로 정책(모델) 업데이트&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;주요 알고리즘&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;RLHF&lt;/b&gt;: Human Feedback 기반 보상 최적화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;RLAIF&lt;/b&gt;: AI Feedback으로 대체 (비용 절감)&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;DPO/GRPO&lt;/b&gt;: 보상모델 없이 로그우도 차이 기반 직접 최적화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;OREO&lt;/b&gt;: Soft Bellman 방정식 기반 오프라인 RL&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;보상모델 종류&lt;/h3&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Outcome Reward Model&lt;/b&gt;: 정답 여부 기반&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Process Reward Model&lt;/b&gt;: 추론 경로의 논리성 평가&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;  보상 기반 학습은 reasoning 품질 향상, 안전성, 일관성을 확보할 수 있음&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;5. Test-Time Scaling (TTS): 추론단계에서의 최적화&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;TTS는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;모델 파라미터를 변경하지 않고&lt;/b&gt;, 추론 단계에서 연산 자원을 조절하거나 다양한 전략을 적용하여 추론 능력을 향상시키는 방식입니다.&lt;/p&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;&amp;nbsp;&lt;/div&gt;
&lt;div&gt;&lt;br /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;fc19886b-5ea2-4461-b5bf-0290628f1cc4&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span&gt;&lt;b&gt;분류&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span&gt;대표&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span&gt;기법설명&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span&gt;특징&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Sampling&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Best-of-N, Confidence Sampling&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;여러 응답 중 확률/보상 기반으로 최상 선택&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;간단, 다양한 응답 확보&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Decoding&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Beam Search, Self-Consistency&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;여러 경로 탐색 후 일관된 응답 선택&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;구조화된 탐색에 강함&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Reasoning&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;CoT, ToT, GoT&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;생각 흐름을 유도, 트리/그래프 기반 확장&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;복잡한 문제에 효과적&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Search/Verifier&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;MCTS, Verifier Search&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;다양한 경로 시도 후 보상모델로 평가&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;고정확도, compute cost &amp;uarr;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Self-Improvement&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;Self-Refinement, Sequential Revision&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;모델이 응답을 스스로 개선&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;반복적 개선, 직관적&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;&lt;b&gt;Compute 최적화&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;COS (Compute Optimal Scaling)&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;문제 난이도에 따라 compute 전략 선택&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;효율성 극대화, 성능 유지&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;TTS는 특히 수학 문제, 논리 추론, 코드 생성 등에서 큰 성능 향상을 유도하며, 14배 큰 모델을 뛰어넘는 결과를 보이기도 함&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;6. 기존 학습법과의 차이점 비교&lt;/h2&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%; height: 108px;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;b77ead3d-7a0a-4e66-b6f2-fedb8add0395&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;항목&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;기존 학습 방법 &lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;(SFT/RLHF)&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;span&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;Test-Time Scaling (TTS)&lt;/span&gt;&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;연산 시점&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;학습 시 고비용&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;추론 시 동적 비용&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;유연성&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;재학습 필요&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;상황별 전략 조정 가능&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;확장성&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;도메인 전이 어려움&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;입력 난이도 기반 확장 가능&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;성능/비용비&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;높음&lt;/span&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;span&gt;우수 (최적화 가능)&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;7. 실제 사례 및 성능 개선&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;GPT-4 / GPT-4 Turbo&lt;/b&gt;: Self-consistency, verifier 기반 최종 응답 결정&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;WebGPT&lt;/b&gt;: Best-of-N + 보상모델 = QA 정확도 향상&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;DeepSeek-R1&lt;/b&gt;: GRPO + DPO + TTS 조합으로 reasoning 성능 극대화&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;CoT + ToT + MCTS 조합&lt;/b&gt;: 수학, 논리 문제 정답률 급상승&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;COS는 Best-of-N 대비 4배 적은 연산으로 동일 정확도 달성&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size26&quot;&gt;8. 결론 및 요약&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;사후학습 전략은 다음과 같은 이유로 현대 LLM 시스템에서 필수적입니다:&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;bulletList&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Fine-Tuning&lt;/b&gt;: 기본 능력을 태스크에 맞춰 정렬&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Reinforcement Learning&lt;/b&gt;: 사용자 가치/선호에 정합되도록 정렬&lt;/li&gt;
&lt;li data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;listItem&quot; data-prosemirror-content-type=&quot;node&quot;&gt;&lt;b&gt;Test-Time Scaling&lt;/b&gt;: 추론 과정에서 능동적 최적화 가능&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;이 세 축은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;상호보완적&lt;/b&gt;이며, 현대 모델(GPT-4, Gemini, DeepSeek 등)은 이를 통합적으로 활용하여&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;성능, 효율성, 안정성&lt;/b&gt;을 동시에 확보하고 있습니다.&lt;/p&gt;
&lt;blockquote style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;blockquote&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot;&gt;
&lt;p data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;✅ 미래 LLM 시스템은 학습 + 추론 최적화를 동시에 활용하는 하이브리드 전략으로 진화할 것&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음 글&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/132&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.03.27 - [AI와 함께] - LLM 개발 및 활용을 위한 대표 기술 및 프레임워크 개요&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1743041897160&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;LLM 개발 및 활용을 위한 대표 기술 및 프레임워크 개요&quot; data-og-description=&quot;1.&amp;nbsp;Parameter-Efficient Fine-Tuning &amp;amp; Model Compression&amp;nbsp;LoRA저차원 어댑터(LoRA)를 삽입하여 효율적인 미세조정 수행QLoRA4비트 정밀도 양자화와 LoRA를 결합하여 소비자 GPU에서도 튜닝 가능GPTQGPT 모델에 적합한 &quot; data-og-host=&quot;datacook.tistory.com&quot; data-og-source-url=&quot;https://datacook.tistory.com/132&quot; data-og-url=&quot;https://datacook.tistory.com/132&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/cAK0Nh/hyYvs49dLA/fKbLYKCOjNRJdnGFvgkSHk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bCrzRS/hyYyG1IB9Y/p238FWAPnu83FHJXpfl8ek/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/132&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://datacook.tistory.com/132&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/cAK0Nh/hyYvs49dLA/fKbLYKCOjNRJdnGFvgkSHk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bCrzRS/hyYyG1IB9Y/p238FWAPnu83FHJXpfl8ek/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;LLM 개발 및 활용을 위한 대표 기술 및 프레임워크 개요&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;1.&amp;nbsp;Parameter-Efficient Fine-Tuning &amp;amp; Model Compression&amp;nbsp;LoRA저차원 어댑터(LoRA)를 삽입하여 효율적인 미세조정 수행QLoRA4비트 정밀도 양자화와 LoRA를 결합하여 소비자 GPU에서도 튜닝 가능GPTQGPT 모델에 적합한&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;datacook.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/134&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;2025.03.27 - [분류 전체보기] - Introduction: Why Post-training for LLMs Matters&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1743041942105&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;article&quot; data-og-title=&quot;Introduction: Why Post-training for LLMs Matters&quot; data-og-description=&quot;현대의 대형 언어 모델(LLMs)은 단순한 텍스트 생성 능력을 넘어, 복잡한&amp;nbsp;다단계 추론(multi-step reasoning),&amp;nbsp;자동화된 콘텐츠 생성,&amp;nbsp;멀티모달 상호작용에 이르기까지 다양한 영역에서 탁월한 성능을&quot; data-og-host=&quot;datacook.tistory.com&quot; data-og-source-url=&quot;https://datacook.tistory.com/134&quot; data-og-url=&quot;https://datacook.tistory.com/134&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/bMMLIy/hyYyOenKZW/5qZnHaXRooQZXz7BR97vIK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/wErnm/hyYvlEWFLq/jl7WjpCHTiL1duYz2XrTjk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://datacook.tistory.com/134&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://datacook.tistory.com/134&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/bMMLIy/hyYyOenKZW/5qZnHaXRooQZXz7BR97vIK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/wErnm/hyYvlEWFLq/jl7WjpCHTiL1duYz2XrTjk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;Introduction: Why Post-training for LLMs Matters&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;현대의 대형 언어 모델(LLMs)은 단순한 텍스트 생성 능력을 넘어, 복잡한&amp;nbsp;다단계 추론(multi-step reasoning),&amp;nbsp;자동화된 콘텐츠 생성,&amp;nbsp;멀티모달 상호작용에 이르기까지 다양한 영역에서 탁월한 성능을&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;datacook.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>llm fine-tuning</category>
      <category>llm 사후 학습</category>
      <category>RLHF</category>
      <category>sft</category>
      <category>TTS</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/133</guid>
      <comments>https://datacook.tistory.com/133#entry133comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:11:06 +0900</pubDate>
    </item>
    <item>
      <title>LLM 개발 및 활용을 위한 대표 기술 및 프레임워크 개요</title>
      <link>https://datacook.tistory.com/132</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;1.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Parameter-Efficient Fine-Tuning &amp;amp; Model Compression&lt;/b&gt;&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;990&quot; data-table-local-id=&quot;25c56605-4617-4525-9949-933baab30986&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;LoRA&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;저차원 어댑터(LoRA)를 삽입하여 효율적인 미세조정 수행&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;QLoRA&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;4비트 정밀도 양자화와 LoRA를 결합하여 소비자 GPU에서도 튜닝 가능&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;GPTQ&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;GPT 모델에 적합한 양자화 기법으로 성능 저하 없이 경량화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;SparseGPT&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;중요하지 않은 파라미터를 제거하여 모델 크기 축소&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;PEFT (HF)&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;다양한 파라미터 효율적 튜닝 기법을 통합한 HuggingFace 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;BitsAndBytes&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;8비트 옵티마이저 및 4비트 양자화를 지원하여 메모리 절약&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;AdaLoRA&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;네트워크 층마다 LoRA 적용 범위를 동적으로 조절&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;324&quot;&gt;&lt;span&gt;&lt;b&gt;P-Tuning v2&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;665&quot;&gt;&lt;span&gt;지속적으로 학습 가능한 프롬프트를 통해 미세조정 수행&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;2.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Data Management &amp;amp; Preprocessing&lt;/b&gt;&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1002&quot; data-table-local-id=&quot;89fcdff1-8dc3-4009-b020-8c3624bedcd8&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;HF Datasets&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;30,000개 이상의 데이터셋에 대해 스트리밍 및 버전 관리 API 제공&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;WebDataset&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;대용량 학습용으로 최적화된 tar 기반 스트리밍 포맷&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;DVC&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;Git 스타일의 데이터 버전 관리 및 파이프라인 추적 지원&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;Apache Arrow&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;고성능 칼럼 기반 메모리 포맷으로 효율적 데이터 접근 지원&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;Zstandard&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;고속 압축 알고리즘으로 데이터 전송 및 저장 최적화&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;329&quot;&gt;&lt;span&gt;&lt;b&gt;Cleanlab&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;라벨 오류 및 이상치를 자동 탐지하여 데이터 정제 지원&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;3.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Distributed Training &amp;amp; Optimization&lt;/b&gt;&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;225b865a-35f6-46fe-88ef-ec53cdfce668&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;335&quot;&gt;&lt;span&gt;&lt;b&gt;DeepSpeed&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;675&quot;&gt;&lt;span&gt;ZeRO 병렬화, 메모리 최적화 등 대형 모델을 위한 학습 최적화 엔진&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;335&quot;&gt;&lt;span&gt;&lt;b&gt;Megatron-LM&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;675&quot;&gt;&lt;span&gt;NVIDIA가 제공하는 대규모 트랜스포머 모델 병렬 학습 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;335&quot;&gt;&lt;span&gt;&lt;b&gt;Colossal-AI&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;675&quot;&gt;&lt;span&gt;다양한 병렬 전략을 지원하는 통합 분산 학습 시스템&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;335&quot;&gt;&lt;span&gt;&lt;b&gt;Horovod&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;675&quot;&gt;&lt;span&gt;MPI 기반의 멀티 GPU/노드 간 동기화 훈련 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;335&quot;&gt;&lt;span&gt;&lt;b&gt;Ray&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;675&quot;&gt;&lt;span&gt;분산 Python 애플리케이션을 위한 범용 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;4.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Efficient Inference &amp;amp; Deployment&lt;/b&gt;&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;309316f2-9b0b-4771-9a86-e84a96754653&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;vLLM&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;페이지드 attention 기법으로 고속 LLM 추론 제공&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;TensorRT&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;NVIDIA 기반 GPU 추론 최적화 및 커널 융합 지원&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;Triton&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;병렬 요청 처리를 지원하는 AI 추론용 서버 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;ONNX&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;다양한 하드웨어에 이식 가능한 통합 추론 엔진&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;OpenVINO&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;Intel 기반 CPU/iGPU 최적화 런타임 및 양자화 지원&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;XNNPACK&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;ARM 기반 장치를 위한 고성능 커널 구현&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;338&quot;&gt;&lt;span&gt;&lt;b&gt;Groq&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;672&quot;&gt;&lt;span&gt;전용 텐서 스트리밍 프로세서를 사용하는 초저지연 AI 추론 시스템&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;5.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Integrated Development Ecosystems&lt;/b&gt;&lt;/h3&gt;
&lt;div style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;table&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;div data-testid=&quot;table-alignment-container&quot;&gt;
&lt;div&gt;
&lt;div&gt;
&lt;div data-testid=&quot;table-container&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot;&gt;
&lt;div&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-table-width=&quot;1011&quot; data-table-local-id=&quot;46ec6b13-dab4-48db-b2e8-12a0b508b995&quot; data-autosize=&quot;false&quot; data-layout=&quot;center&quot; data-number-column=&quot;false&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;&lt;b&gt;HF Ecosystem&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;669&quot;&gt;&lt;span&gt;HuggingFace의 모델 + 데이터셋 + 추론 API 통합 환경&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;&lt;b&gt;DeepSpeed&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;669&quot;&gt;&lt;span&gt;학습부터 추론까지 통합 제공하는 Microsoft 기반 솔루션&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;&lt;b&gt;PyTorch&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;669&quot;&gt;&lt;span&gt;LLM 개발에 최적화된 범용 딥러닝 프레임워크&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableRow&quot; data-prosemirror-content-type=&quot;node&quot;&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;341&quot;&gt;&lt;span&gt;&lt;b&gt;LLM Reasoners&lt;/b&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;tableCell&quot; data-prosemirror-content-type=&quot;node&quot; data-colwidth=&quot;669&quot;&gt;&lt;span&gt;검색 기반 강화 추론(Advanced Reasoning)을 위한 엔진 구성 가능&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;rule&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;heading&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size23&quot;&gt;✳️ 요약&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-prosemirror-node-block=&quot;true&quot; data-prosemirror-node-name=&quot;paragraph&quot; data-prosemirror-content-type=&quot;node&quot; data-ke-size=&quot;size16&quot;&gt;LLM의 고도화에는 단순히 모델을 훈련시키는 것뿐만 아니라,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;튜닝의 효율성, 데이터 품질 관리, 분산 최적화, 실시간 추론 성능&lt;/b&gt;, 그리고&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;생태계 통합성&lt;/b&gt;이 모두 중요한 요소로 작용한다. 각 기술은 특정 목적을 중심으로 개발되어, 적절한 조합을 통해 LLM의 실제 응용 가능성과 생산성을 극대화할 수 있다.&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>LLM 개발</category>
      <category>llm 대표 기술</category>
      <category>lora</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/132</guid>
      <comments>https://datacook.tistory.com/132#entry132comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:05:24 +0900</pubDate>
    </item>
    <item>
      <title>✅ Sentence Transformers의 Bi-Encoder vs Cross-Encoder 비교 분석</title>
      <link>https://datacook.tistory.com/131</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;1. 서론&lt;/h2&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;Sentence Transformers는 텍스트 의미를 벡터로 표현하여 다양한 자연어 처리(NLP) 응용에 활용되는 대표적인 문장 임베딩 프레임워크입니다. 이 프레임워크는 두 가지 주요 아키텍처인&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Bi-Encoder&lt;/b&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;Cross-Encoder&lt;/b&gt;를 제공하며, 두 모델은 정확도, 처리 속도, 확장성 측면에서 상호보완적인 특성을 갖습니다. 본 보고서는 각 모델의 작동 원리, 장단점, 데이터셋 구조, 실제 활용 사례를 종합적으로 비교합니다.&lt;/p&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;2. 작동 방식 및 구조 비교&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;Bi-Encoder&lt;/b&gt;: 두 문장을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;독립적으로 임베딩&lt;/b&gt;한 후, 코사인 유사도 등의 거리 메트릭으로 유사도를 계산함. (Embedding Model)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;✔ 빠른 유사도 계산&lt;/li&gt;
&lt;li&gt;✔ 벡터 사전 계산 및 재사용 가능&lt;/li&gt;
&lt;li&gt;❌ 문맥 이해력 상대적으로 낮음&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Cross-Encoder&lt;/b&gt;: 문장 쌍을&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;b&gt;하나의 입력으로 통합&lt;/b&gt;하여 Transformer에 넣고, 두 문장의 관계를 직접 판단함. (Reranker Model)
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;✔ 문맥적 관계 정밀 분석 가능&lt;/li&gt;
&lt;li&gt;❌ 모든 문장쌍 비교 필요 &amp;rarr; 계산량 많고 느림&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;3. 성능 및 효율성&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;color: #000000; text-align: start; border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;항목&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;Bi-Encoder&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;Cross-Encoder&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;정확도&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;중간 (의미는 잘 잡지만, 문장 간 상호작용 제한)&lt;/td&gt;
&lt;td&gt;매우 높음 (미묘한 문맥까지 반영 가능)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;속도&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;매우 빠름&lt;/td&gt;
&lt;td&gt;느림&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;확장성&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;높음 (수백만 문서 처리 가능)&lt;/td&gt;
&lt;td&gt;낮음 (O(n&amp;sup2;) 연산 필요)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;컴퓨팅 자원&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;적음 (MiniLM 계열 모델로 처리 가능)&lt;/td&gt;
&lt;td&gt;많음 (BERT 기반 대형 모델 필요)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;임베딩 재사용&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;가능 (벡터 DB 구축 및 검색 용이)&lt;/td&gt;
&lt;td&gt;불가능 (문장쌍 입력마다 새로 계산)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;4. 학습 방법과 데이터셋 구성&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;color: #000000; text-align: start; border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;항목&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;Bi-Encoder&lt;/span&gt;&lt;/td&gt;
&lt;td&gt;&lt;span style=&quot;color: #000000; text-align: start;&quot;&gt;Cross-Encoder&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;입력 데이터 형식&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;(anchor, positive, negative) 또는 유사/비유사 쌍&lt;/td&gt;
&lt;td&gt;(문장1, 문장2, label or similarity score)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;손실 함수&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;TripletLoss, MultipleNegativesRankingLoss 등&lt;/td&gt;
&lt;td&gt;SoftmaxLoss, MSELoss, ContrastiveLoss 등&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;평가 지표&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;임베딩 유사도 기반 점수&lt;/td&gt;
&lt;td&gt;분류 정확도, MSE 등&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;b&gt;학습 목적&lt;/b&gt;&lt;/td&gt;
&lt;td&gt;유사한 문장은 가깝게, 비유사 문장은 멀게&lt;/td&gt;
&lt;td&gt;문장 쌍 간 관계 정밀 분류 및 점수 예측&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;5. 실제 사용 사례&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;table style=&quot;color: #000000; text-align: start; border-collapse: collapse; width: 100%; height: 90px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;b&gt;활용 분야&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;Bi-Encoder&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;Cross-Encoder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;b&gt;대규모 의미 검색&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;✅ 적합 (벡터 기반 검색 엔진, 문서 추천 등)&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;❌ 부적합&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;b&gt;질의 응답 시스템&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;후보 문서 검색용 (Retrieve 단계)&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;최종 답변 선택 (Re-Rank 단계)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;b&gt;STS/NLI 분류&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;임베딩 유사도 기반 분류&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;정확한 문장 관계 분류에 적합&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;&lt;b&gt;의료/법률 문서 분석&lt;/b&gt;&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;❌ 단어 단위 정밀 의미 분석에는 부적합&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;✅ 중요 (미묘한 의미 차이 반영 가능)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;6. 통합 활용 전략: Retrieve &amp;amp; Re-Rank&lt;/h2&gt;
&lt;ul style=&quot;list-style-type: disc; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;b&gt;1단계 (Retrieve)&lt;/b&gt;: Bi-Encoder를 사용해 전체 문서에서 의미적으로 유사한 후보를 빠르게 검색&lt;/li&gt;
&lt;li&gt;&lt;b&gt;2단계 (Re-Rank)&lt;/b&gt;: Cross-Encoder를 통해 정밀하게 의미 유사도 평가 후 재정렬&lt;/li&gt;
&lt;/ul&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;  이 전략은 &lt;b&gt;대규모 처리 효율성(Bi-Encoder)&lt;/b&gt;과 &lt;b&gt;정확도(Cross-Encoder)&lt;/b&gt;를 동시에 확보할 수 있는 산업계 표준 구조입니다.&lt;/p&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;7. 결론 및 모델 선택 가이드&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;고려 요소Bi-Encoder 추천 경우Cross-Encoder 추천 경우&lt;/p&gt;
&lt;table style=&quot;color: #000000; text-align: start; border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;대용량 데이터&lt;/td&gt;
&lt;td&gt;✅ 매우 적합&lt;/td&gt;
&lt;td&gt;❌ 연산량 과다로 부적합&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;실시간 응답&lt;/td&gt;
&lt;td&gt;✅ 빠름&lt;/td&gt;
&lt;td&gt;❌ 느림&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;정확도 요구&lt;/td&gt;
&lt;td&gt;❌ 상대적으로 낮음&lt;/td&gt;
&lt;td&gt;✅ 높음&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;문맥적 정밀 분석&lt;/td&gt;
&lt;td&gt;❌ 제한적 분석&lt;/td&gt;
&lt;td&gt;✅ 복잡한 관계 정밀 이해 필요 시&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;컴퓨팅 자원 제한&lt;/td&gt;
&lt;td&gt;✅ 저비용, 고효율&lt;/td&gt;
&lt;td&gt;❌ 대형 모델 필요, 리소스 소비 많음&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h2 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size26&quot;&gt;✅ 최종 요약 표&lt;/h2&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;항목Bi-EncoderCross-Encoder&lt;/p&gt;
&lt;table style=&quot;color: #000000; text-align: start; border-collapse: collapse; width: 100%; height: 154px;&quot; border=&quot;1&quot; data-ke-align=&quot;alignLeft&quot;&gt;
&lt;tbody&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;작동 방식&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;독립적 문장 임베딩&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;문장쌍 동시 인코딩&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;출력 결과&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;문장별 고정 벡터&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;문장쌍의 유사도 점수&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;유사도 계산 방식&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;코사인 유사도 등&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;직접 계산 (신경망 내부에서 평가)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;속도 / 확장성&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;빠름 / 뛰어남&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;느림 / 낮음&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;임베딩 저장/재사용&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;가능&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;불가능&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;정확도&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;중간 (빠른 계산 우선)&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;높음 (정밀한 문맥 분석 우선)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;주요 사용 사례&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;대규모 의미 검색, 추천 시스템 등&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;QA 재정렬, 법률/의료 텍스트 유사성 판단 등&lt;/td&gt;
&lt;/tr&gt;
&lt;tr style=&quot;height: 18px;&quot;&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;적합한 문제 유형&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;확장성, 속도 중심 과제&lt;/td&gt;
&lt;td style=&quot;height: 18px;&quot;&gt;정밀도, 정확도 중심 과제&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>AI와 함께</category>
      <category>bi-encoder</category>
      <category>cross-encoder</category>
      <category>embedding model</category>
      <category>rerank model</category>
      <category>sentence transformers</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/131</guid>
      <comments>https://datacook.tistory.com/131#entry131comment</comments>
      <pubDate>Thu, 27 Mar 2025 11:01:51 +0900</pubDate>
    </item>
    <item>
      <title>효율적인 AI 에이전트 개발을 위한 4단계 체크리스트</title>
      <link>https://datacook.tistory.com/130</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1024&quot; data-origin-height=&quot;1024&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/OUl8M/btsLdwSCxJC/jZIshOSdWmPxABqqKmQh41/img.webp&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/OUl8M/btsLdwSCxJC/jZIshOSdWmPxABqqKmQh41/img.webp&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/OUl8M/btsLdwSCxJC/jZIshOSdWmPxABqqKmQh41/img.webp&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FOUl8M%2FbtsLdwSCxJC%2FjZIshOSdWmPxABqqKmQh41%2Fimg.webp&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1024&quot; height=&quot;1024&quot; data-origin-width=&quot;1024&quot; data-origin-height=&quot;1024&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;1600&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/MbKeR/btsLcMIFzXP/p0z4bYff08KNzF0ffI88G1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/MbKeR/btsLcMIFzXP/p0z4bYff08KNzF0ffI88G1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/MbKeR/btsLcMIFzXP/p0z4bYff08KNzF0ffI88G1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FMbKeR%2FbtsLcMIFzXP%2Fp0z4bYff08KNzF0ffI88G1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;689&quot; height=&quot;861&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;1600&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;1단계: 문제 정의 및 데이터 준비&lt;/b&gt;&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;AI 에이전트의 목적에 맞는 문제를 명확히 정의하고, 관련 데이터를 수집 및 준비하는 단계입니다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;&lt;b&gt;목적 정의&lt;/b&gt;&lt;br /&gt;AI 에이전트가 수행해야 할 특정 목적, 과제, 목표를 명확히 설정합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;데이터 수집&lt;/b&gt;&lt;br /&gt;학습 및 평가를 위해 과제에 적합한 다양하고 대표적인 데이터를 수집합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;데이터 정제&lt;/b&gt;&lt;br /&gt;모델 학습의 정확도를 높이기 위해 불필요하거나 품질이 낮은 데이터를 제거합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;특징 엔지니어링&lt;/b&gt;&lt;br /&gt;에이전트의 도메인에 적합한 핵심 특징을 식별하고 이를 학습 가능하도록 전처리합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;지식 베이스 설정&lt;/b&gt;&lt;br /&gt;에이전트가 활용할 수 있는 태스크 관련 지식(예: 시맨틱 검색 데이터베이스 또는 그래프 기반 지식)을 체계적으로 구축합니다.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;2단계: 모델 미세 조정 및 통합&lt;/b&gt;&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;AI 모델을 선택하고 과제에 맞게 조정한 후, 시스템 환경에 통합하는 단계입니다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;&lt;b&gt;모델 선택&lt;/b&gt;&lt;br /&gt;사전에 학습된 모델(pre-trained model) 또는 커스텀 모델 중 에이전트의 목적에 적합한 모델을 선택합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;미세 조정(Fine-Tuning)&lt;/b&gt;&lt;br /&gt;과제별 데이터로 모델을 추가 학습시켜 특정 도메인에서 더 나은 성능을 발휘하도록 조정합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;행동 훈련&lt;/b&gt;&lt;br /&gt;강화 학습 등을 활용하여 에이전트가 상황에 맞게 더 나은 의사결정을 하도록 훈련합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;메모리 관리&lt;/b&gt;&lt;br /&gt;에이전트가 단기 및 장기 기억 기능을 활용하여 문맥을 유지하고 상황에 적응할 수 있도록 설계합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;도구 및 API 통합&lt;/b&gt;&lt;br /&gt;에이전트가 외부 시스템, 데이터베이스, API와 원활하게 상호작용할 수 있도록 통합합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;다중 에이전트 협력&lt;/b&gt;&lt;br /&gt;여러 AI 에이전트가 서로 협력하고 조정하여 작업을 수행할 수 있도록 지원합니다.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;3단계: 검증 및 최적화&lt;/b&gt;&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;에이전트의 성능, 안정성, 효율성을 테스트하고 최적화하는 단계입니다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;&lt;b&gt;성능 테스트&lt;/b&gt;&lt;br /&gt;에이전트의 정확도, 속도, 자원 효율성을 다양한 조건에서 평가합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;도구 검증&lt;/b&gt;&lt;br /&gt;에이전트와 통합된 외부 도구 및 API가 제대로 작동하는지 확인합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;다중 모달 통합&lt;/b&gt;&lt;br /&gt;비전, 텍스트, 음성 등 다양한 데이터 형식을 통합하여 더 풍부하고 역동적인 상호작용을 가능하게 합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;자원 관리&lt;/b&gt;&lt;br /&gt;성능을 유지하면서도 메모리, CPU/GPU 사용량 등 계산 자원을 최적화합니다.&lt;/li&gt;
&lt;/ol&gt;
&lt;hr data-ke-style=&quot;style1&quot; /&gt;
&lt;h3 style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size23&quot;&gt;&lt;b&gt;4단계: 학습 및 업데이트&lt;/b&gt;&lt;/h3&gt;
&lt;p style=&quot;color: #000000; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;피드백을 통해 지속적으로 개선하고 변화하는 요구사항에 적응하는 단계입니다.&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; color: #000000; text-align: start;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;&lt;b&gt;피드백 루프&lt;/b&gt;&lt;br /&gt;사용자 피드백을 수집하여 약점이나 개선이 필요한 부분을 파악합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;모니터링 및 평가&lt;/b&gt;&lt;br /&gt;사전에 정의된 성능 지표를 통해 에이전트의 성능을 정기적으로 추적 및 평가합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;지속적 미세 조정&lt;/b&gt;&lt;br /&gt;새로운 데이터 또는 요구사항 변화에 따라 모델을 지속적으로 업데이트합니다.&lt;/li&gt;
&lt;li&gt;&lt;b&gt;실패 복구&lt;/b&gt;&lt;br /&gt;에이전트의 오류 발생 지점을 식별하고, 이를 복구할 수 있는 메커니즘을 구축합니다.&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;p style=&quot;text-align: left;&quot; data-ke-size=&quot;size16&quot;&gt;&lt;b&gt;** 내용은 gpt-4o로 작성되었습니다.&lt;/b&gt;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Machine Learning</category>
      <category>ai agent 개발</category>
      <category>ai 모델 개발</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/130</guid>
      <comments>https://datacook.tistory.com/130#entry130comment</comments>
      <pubDate>Mon, 9 Dec 2024 16:50:47 +0900</pubDate>
    </item>
    <item>
      <title>완벽을 향한 여정 - 인생과 딥러닝의 공통점</title>
      <link>https://datacook.tistory.com/129</link>
      <description>&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-filename=&quot;blog_image01.jpeg&quot; data-origin-width=&quot;2048&quot; data-origin-height=&quot;2048&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/GVpfE/btsKM0tpbEN/2p64352QMiELWFjqurykC1/img.jpg&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/GVpfE/btsKM0tpbEN/2p64352QMiELWFjqurykC1/img.jpg&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/GVpfE/btsKM0tpbEN/2p64352QMiELWFjqurykC1/img.jpg&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FGVpfE%2FbtsKM0tpbEN%2F2p64352QMiELWFjqurykC1%2Fimg.jpg&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;2048&quot; height=&quot;2048&quot; data-filename=&quot;blog_image01.jpeg&quot; data-origin-width=&quot;2048&quot; data-origin-height=&quot;2048&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사람은 완벽하지 않다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;완벽하지 않기에 매일 실수를 반복하고, 그 실수를 통해서 앞으로 나아가고, 새롭게 배운다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그러면서 점차 좋은 사람이 되려고 노력하고, 완벽해지려고 노력한다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;사람은 완벽해질 수 없다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;내 시선에서 완벽하다고 느낄지라도,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다른 누군가의 시선에서는 나는 불완전한 존재일 뿐이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;딥러닝에서 Loss를 구하는 방법은 정답지와 예측된 결과의 차이를 보는 것으로 시작된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그리고 그 차이를 좁혀 나가는 것으로 모델은 성능이 좋아진다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우리의 인생도 같지 않은가?&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;좋은 사람이 되기 위해 어떤 좋은 사람이 될지를 정의하고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;정의된 좋은 사람에서 지금의 나의 모습을 빼면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그러면 어떠한 부분이 부족했는지 객관적으로 알 수 있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이제는 내가 되고자 하는 좋은 사람의 모습과 내 현재의 모습의 차이를 어떻게 좁혀나갈지는 본인의 선택에 달려있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;차이를 줄이는 방법으로는 좋은 옵티마이저를 선정하는 것이다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어떠한 옵티마이저를 선정하냐에 따라서 내 인생은 달라지게 돼있다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;어떠한 부분을 가장 가치 있게 볼 것인가?&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;가족, 연인, 친구, 직장, 돈 등 삶에서 중요한 부분을 차지하는 어떠한 요소를 중요한 옵티마이저의 요소로 쓸지는 본인이 결정하면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;지금 내가 안 좋은 인생, 나쁜 인생을 살고 있다고 생각하지 않아도 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;새로운 모델은 계속 나오고, 새로운 방법론은 계속 생겨난다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;그럴 때마다 내가 기존에 가지고 있던 모델, 즉 마인드를 버리고 새로운 마인드로 살아가면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;새로운 마인드에서는 나쁜 습관들을 내려놓고 좋은 습관들로 다시 채우면 된다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;hr contenteditable=&quot;false&quot; data-ke-type=&quot;horizontalRule&quot; data-ke-style=&quot;style5&quot; /&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://new.express.adobe.com/&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Image gernerate&lt;/a&gt;&lt;/p&gt;</description>
      <category>걸으며 생각한 것들</category>
      <category>사람과 딥러닝</category>
      <category>완벽을 향한 여정</category>
      <category>인생과 딥러닝의 공통점</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/129</guid>
      <comments>https://datacook.tistory.com/129#entry129comment</comments>
      <pubDate>Sun, 17 Nov 2024 18:02:57 +0900</pubDate>
    </item>
    <item>
      <title>01. MMDetection(object detection) 시작하기</title>
      <link>https://datacook.tistory.com/128</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;mmdetection은 오픈소스 객체 감지 툴박스로, 주로 컴퓨터 비전 연구와 응용 프로그램 개발에 사용됩니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이 툴박스는 PyTorch 기반으로 구축되었으며, 다양한 객체 감지 모델과 알고리즘을 쉽게 구현할 수 있도록 설계되었습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;mmdetection은 모듈화가 잘 되어 있어서 사용자가 다양한 구성요소를 쉽게 교체하거나 업그레이드할 수 있습니다.&lt;/p&gt;
&lt;p style=&quot;background-color: #fafafa; color: #212121; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;mmdetection에서는 주로 네 가지 주요 구성 요소를 사용합니다: Backbone, Neck, Head, Loss. 각각의 구성 요소는 다음과 같은 역할을 합니다:&lt;/p&gt;
&lt;ol style=&quot;list-style-type: decimal; background-color: #fafafa; color: #212121; text-align: start;&quot; data-ke-list-type=&quot;decimal&quot;&gt;
&lt;li&gt;&lt;b&gt;Backbone&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;이는 모델의 기본적인 구조로, 입력 이미지에서 고수준의 특징을 추출하는 역할을 합니다. 일반적으로 사용되는 backbone에는 ResNet, VGG, MobileNet 등이 있습니다.&lt;/li&gt;
&lt;li&gt;예: ResNet-50은 깊이가 50층인 Residual Networks로, 각 층에서의 학습을 용이하게 하기 위해 skip connection을 사용합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Neck&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Neck은 backbone에서 추출된 특징을 다듬고, 다양한 스케일의 특징들을 조합하여 더 유용한 특징 정보를 생성하는 역할을 합니다. 예를 들어, FPN(Feature Pyramid Network)은 다양한 해상도에서 특징을 통합하여 각각의 해상도에서 객체를 효과적으로 감지할 수 있게 돕습니다.&lt;/li&gt;
&lt;li&gt;예: FPN은 여러 스케일의 특징 맵을 상하로 연결하여 객체의 크기가 다양할 때도 효과적인 감지가 가능하게 합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Head&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Head는 최종적인 감지 목표를 달성하기 위해 특징 맵을 사용합니다. 이는 분류(classification), 회귀(regression), 객체 경계 상자 예측 등을 수행할 수 있습니다. 예를 들어, Faster R-CNN에서는 RPN(Region Proposal Network)과 RoI(Region of Interest) head를 사용합니다.&lt;/li&gt;
&lt;li&gt;예: RPN은 객체가 있을 법한 위치의 경계 상자를 제안하고, RoI head는 이 상자들을 기반으로 객체의 클래스와 정확한 위치를 예측합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Loss&lt;/b&gt;:
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Loss 함수는 모델의 예측이 실제 값과 얼마나 잘 일치하는지를 측정합니다. 이를 통해 모델 학습 중에 가중치를 조정하게 됩니다. 각각의 태스크(예: 분류, 경계 상자 회귀 등)에 맞는 손실 함수가 사용됩니다.&lt;/li&gt;
&lt;li&gt;예: 분류에는 Cross-Entropy Loss를 사용하고, 경계 상자의 위치 조정에는 Smooth L1 Loss를 사용합니다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;크게 4개의 모듈로 정의가 가능하고, 구성 요소별로 테스트가 가능한 점이 접근성이 매우 뛰어나 보입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;테스트 예제는 RTMDet-l(large)로 진행하게끔 되어 있습니다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1370&quot; data-origin-height=&quot;1082&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/Qgqw1/btsHnUjRAXJ/LmrWfgw7bgO0nxfKNKJ1qk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/Qgqw1/btsHnUjRAXJ/LmrWfgw7bgO0nxfKNKJ1qk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/Qgqw1/btsHnUjRAXJ/LmrWfgw7bgO0nxfKNKJ1qk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FQgqw1%2FbtsHnUjRAXJ%2FLmrWfgw7bgO0nxfKNKJ1qk%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1370&quot; height=&quot;1082&quot; data-origin-width=&quot;1370&quot; data-origin-height=&quot;1082&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;성능은 YOLO model과 비슷해보이네요.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아키텍쳐는 다음과 같습니다.&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;907&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bojjIA/btsHmF2h98c/D7yftKakI971mipIP92SuK/img.jpg&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bojjIA/btsHmF2h98c/D7yftKakI971mipIP92SuK/img.jpg&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bojjIA/btsHmF2h98c/D7yftKakI971mipIP92SuK/img.jpg&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbojjIA%2FbtsHmF2h98c%2FD7yftKakI971mipIP92SuK%2Fimg.jpg&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1090&quot; height=&quot;772&quot; data-origin-width=&quot;1280&quot; data-origin-height=&quot;907&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;4개의 모듈로 구성되어 있습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이걸 보니깐 정말 vision을 전공하지 않는 이상 모듈을 갈아 끼우는 건 쉬워보이진 않네요.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;아마 다른 누군가가 잘 조합한 구조를 그대로 차용해서, fine-tune을 진행하는게 가장 쉬워보이는 작업이 되겠네요.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;전체 튜토리얼 코드는 다음과 같습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1715585254387&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-ke-align=&quot;alignCenter&quot; data-og-type=&quot;object&quot; data-og-title=&quot;mmdetection/demo/MMDet_Tutorial.ipynb at main &amp;middot; open-mmlab/mmdetection&quot; data-og-description=&quot;OpenMMLab Detection Toolbox and Benchmark. Contribute to open-mmlab/mmdetection development by creating an account on GitHub.&quot; data-og-host=&quot;github.com&quot; data-og-source-url=&quot;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&quot; data-og-url=&quot;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/dRdeUX/hyV2AECAUc/2gQwmsyZifKPKRwoKaBSx1/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600&quot;&gt;&lt;a href=&quot;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://github.com/open-mmlab/mmdetection/blob/main/demo/MMDet_Tutorial.ipynb&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/dRdeUX/hyV2AECAUc/2gQwmsyZifKPKRwoKaBSx1/img.png?width=1200&amp;amp;height=600&amp;amp;face=0_0_1200_600');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot; data-ke-size=&quot;size16&quot;&gt;mmdetection/demo/MMDet_Tutorial.ipynb at main &amp;middot; open-mmlab/mmdetection&lt;/p&gt;
&lt;p class=&quot;og-desc&quot; data-ke-size=&quot;size16&quot;&gt;OpenMMLab Detection Toolbox and Benchmark. Contribute to open-mmlab/mmdetection development by creating an account on GitHub.&lt;/p&gt;
&lt;p class=&quot;og-host&quot; data-ke-size=&quot;size16&quot;&gt;github.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;모델을 다운받는 코드는 다음과 같습니다.&lt;/p&gt;
&lt;pre class=&quot;jboss-cli&quot; style=&quot;color: #212121; text-align: start;&quot;&gt;&lt;code&gt;# We download the pre-trained checkpoints for inference and finetuning.
!mkdir ./checkpoints
!mim download mmdet --config rtmdet_tiny_8xb32-300e_coco --dest ./checkpoints&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;mim은 OpenMMLab 프로젝트의 Python 도구입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;pip와 같은 프로그램 같네요.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;rtmdet_tiny_8xb32-300e_coco 이라는 config를 다운받아서 model을 실행시키는 구조입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-ke-mobileStyle=&quot;widthOrigin&quot; data-origin-width=&quot;1504&quot; data-origin-height=&quot;88&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/betbd5/btsHo38mFsH/Gx6oKtHlh7JrNGICS079y1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/betbd5/btsHo38mFsH/Gx6oKtHlh7JrNGICS079y1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/betbd5/btsHo38mFsH/Gx6oKtHlh7JrNGICS079y1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fbetbd5%2FbtsHo38mFsH%2FGx6oKtHlh7JrNGICS079y1%2Fimg.png&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot; loading=&quot;lazy&quot; width=&quot;1504&quot; height=&quot;88&quot; data-origin-width=&quot;1504&quot; data-origin-height=&quot;88&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;모델의 크기가 55메가밖에 안되다니... 놀랍네요.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;LLM은 못해도 8B짜리 모델을 4비트로 줄여도 5~6GB인데...&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;파일 구조를 보면 일반적인 python 실행 방식과 조금 다른 점을 발견할 수 있습니다.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;모든 실행 코드를 python script에 config 방식으로 넣어서 실행한다는 점이 특이하네요.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;파일을 열어보면 config의 형태를 python의 Dictionary로 설정해주고 있네요.&lt;/p&gt;
&lt;p style=&quot;color: #333333; text-align: start;&quot; data-ke-size=&quot;size16&quot;&gt;model 예시&lt;/p&gt;
&lt;pre id=&quot;code_1715585949749&quot; class=&quot;python&quot; data-ke-language=&quot;python&quot; data-ke-type=&quot;codeblock&quot;&gt;&lt;code&gt;model = dict(
    backbone=dict(
        act_cfg=dict(inplace=True, type='SiLU'),
        arch='P5',
        channel_attention=True,
        deepen_factor=0.167,
        expand_ratio=0.5,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmdetection/v3.0/rtmdet/cspnext_rsb_pretrain/cspnext-tiny_imagenet_600e.pth',
            prefix='backbone.',
            type='Pretrained'),
        norm_cfg=dict(type='SyncBN'),
        type='CSPNeXt',
        widen_factor=0.375),
    neck=dict(
        act_cfg=dict(inplace=True, type='SiLU'),
        expand_ratio=0.5,
        in_channels=[
            96,
            192,
            384,
        ],
        norm_cfg=dict(type='SyncBN'),
        num_csp_blocks=1,
        out_channels=96,
        type='CSPNeXtPAFPN'),
    bbox_head=dict(
        act_cfg=dict(inplace=True, type='SiLU'),
        anchor_generator=dict(
            offset=0, strides=[
                8,
                16,
                32,
            ], type='MlvlPointGenerator'),
        bbox_coder=dict(type='DistancePointBBoxCoder'),
        exp_on_reg=False,
        feat_channels=96,
        in_channels=96,
        loss_bbox=dict(loss_weight=2.0, type='GIoULoss'),
        loss_cls=dict(
            beta=2.0,
            loss_weight=1.0,
            type='QualityFocalLoss',
            use_sigmoid=True),
        norm_cfg=dict(type='SyncBN'),
        num_classes=80,
        pred_kernel_size=1,
        share_conv=True,
        stacked_convs=2,
        type='RTMDetSepBNHead',
        with_objectness=False),
    type='RTMDet')&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이해가 되기 쉽게 조금 수정한 코드입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;위에 설명한 아키텍쳐와 완전한 동일한 형태로 설정을 진행합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Backbone, Neck, Head 정확하게 3개의 모듈을 넣어서 하나의 모델링을 하는 방식이네요.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;우리가 주의 깊게 봐야할 점은 type key에 들어가는 value 같습니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;나머지 하이퍼 파라미터들은 대부분 설정하란대로 하면 될 것이고,&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;type은 사실상 python의 Class Name으로 진행합니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;이렇게 설정을 해놓고서 다음과 같은 API에 담아서 실행하는 구조입니다.&lt;/p&gt;
&lt;pre class=&quot;python&quot; style=&quot;color: #212121; text-align: start;&quot; data-ke-language=&quot;python&quot;&gt;&lt;code&gt;from mmdet.apis import DetInferencer

# Choose to use a config
model_name = 'rtmdet_tiny_8xb32-300e_coco'
# Setup a checkpoint file to load
checkpoint = './checkpoints/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'

# Set the device to be used for evaluation
device = 'cuda:0'

# Initialize the DetInferencer
inferencer = DetInferencer(model_name, checkpoint, device)

# Use the detector to do inference
img = './demo/demo.jpg'
result = inferencer(img, out_dir='./output')&lt;/code&gt;&lt;/pre&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;model_name은 위에서 다운로드 받은 모델이고, checkpoints는 가중치, device 번호만 설정 후&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;DetInferencer를 인스턴스 받아서 calling하면서 저장하는 방식입니다.&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;Vision은 Yolo v3 이후엔 해본 적이 없었는데, 다시 해보니 너무 재밌네요!&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;다음엔 COCO Dataset에 대해서 알아보고, Fine-tuning까지 진행하면 되겠네요.&lt;/p&gt;</description>
      <category>Object-Detection</category>
      <category>mmdetection</category>
      <category>Object-Detection</category>
      <author>Joon09</author>
      <guid isPermaLink="true">https://datacook.tistory.com/128</guid>
      <comments>https://datacook.tistory.com/128#entry128comment</comments>
      <pubDate>Mon, 13 May 2024 16:49:51 +0900</pubDate>
    </item>
  </channel>
</rss>