NHEFS 데이터를 활용한 Graph-based Approach 코드 추가 #30

Funbucket · 2025-10-03T14:56:31Z

DAG 정의 및 Backdoor adjustment 기반 인과효과 식별
DoWhy + EconML을 활용한 ATE 추정 (Linear Regression, DR Learner, DML)

참고

modify readme

jhkimon

이번 기회를 통해 Dowhy 패키지에 대한 더 공부할 수 있는 시간이 된 것 같습니다. 편리한 기능을 많이 제공하는 것 같아, 문서를 읽으면서 많은 것을 배운 것 같습니다. 간단한 코멘트도 같이 첨부드립니다!

+) 추가로 관심 있으실 것 같은 책도 같이 첨부드립니다.
Github
Causal Inference and Discovery in Python

jhkimon · 2025-10-12T03:43:47Z

book/scm/backdoor_criterion.ipynb

+    "    layout=\"dot\"\n",
+    ")"
+   ]
+  },


공식 문서에서 범주형 변수에 대해 pd.get_dummies 를 하지 않고 코드를 수행하는 점을 확인하였습니다. (W2, W3 변수 참조) 아마 내부적으로 더미 변수를 만드는 로직을 수행하는 것으로 보입니다. 그래프, 그리고 어떤 변수가 효과적인가라는 살리기 위해서는 공식문서처럼 범주형 형태를 유지한 상태로 Identification을 돌리는 것이 더 적합해보입니다.

아래 공식문서도 첨부드립니다.

Dowhy 공식문서

추가로 identify_effect() 수행 시에도 ' control_value=0, treatment_value=1'를 명시하면 더 코드 일관성이 좋을 것 같습니다.

jhkimon · 2025-10-12T03:51:07Z

book/scm/backdoor_criterion.ipynb

+   ],
+   "source": [
+    "estimate_dml_fast = est_model.estimate_effect(\n",
+    "    identified_estimand=estimand,\n",


결과값은 위와 동일한듯한데, estimate_dml_fast를 별도로 정의하신 이유가 궁금합니다! 만약 중복 내용이라면 삭제해도 괜찮을 것 같습니다.

아래의 Refute 과정 때문이라면 다른 셀로 이동해서 수행하고, 어떤 모델이든 결과 (ATE) 가 비슷하니 더 가벼운 모델을 써도 될 것 같습니다!

estimate 단계에서는 신뢰구간을 표현하기 위해 bootstrap을 사용했습니다.
반면에 refute 단계에서는 필요하지 않기 때문에, 실행 속도를 위해 해당 과정을 제외했습니다.
말씀하신 대로 code cell은 이동하겠습니다!

jhkimon · 2025-10-12T03:52:45Z

book/scm/overview.md

@@ -0,0 +1,9 @@
+# SCM


Overview 설명을 할때 doWhy에서 공식적으로 제공하는 4 STEP 코드를 같이 알려줘도 직관적으로 큰 도움이 될 것 같습니다.

# I. Create a causal model from the data and given graph. model = CausalModel( data=data["df"], treatment=data["treatment_name"], outcome=data["outcome_name"], graph=data["gml_graph"]) # II. Identify causal effect and return target estimands identified_estimand = model.identify_effect() # III. Estimate the target estimand using a statistical method. estimate = model.estimate_effect(identified_estimand, method_name="backdoor.propensity_score_matching") # IV. Refute the obtained estimate using multiple robustness checks. refute_results = model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")

Dowhy 공식문서

추가로 현재의 내용은 Identification부터 Estimation, Refute까지의 과정을 전부 담고 있기 때문에 SCM보다는 'DoWhy 소개' 등의 제목이 더 적합할 것 같습니다!

의견 감사합니다! 목차와 제목을 구성할때 고민이 많았는데요.

SCM으로 구성한 이유는,
전체 구조를 SCM 개념 위에 두고 그 하위에 다음 주제들을 확장하려는 계획이었습니다.

backdoor criterion

frontdoor criterion

instrumental variable (IV)

causal discovery
를 단계적으로 다루는 형태로 확장할 계획이었습니다.

하나의 페이지에서 모두 다루면 내용이 다소 방대해질 수 있어,
먼저 backdoor criterion 중심으로 분리하여 구성했습니다.
(물론 추후 통합 페이지 형태로 재구성하는 것도 괜찮다고 생각합니다.)

또한 PyWhy 생태계(dowhy, econml 등)를 활용하고 있지만,
핵심이 SCM 프레임워크 위에서 인과 식별, 추정, 검증을 구현하는 흐름이기 때문입니다.

그래서 단순히 DoWhy 소개보다는 SCM 개념을 실질적으로 코드화한 예시라는 점을 강조하고자 했습니다!

jhkimon · 2025-10-12T04:04:02Z

book/scm/backdoor_criterion.ipynb

+   "id": "60080461",
+   "metadata": {},
+   "source": [
+    "## Refute"


공식문서 예제코드를 보면 Refute 과정을 크게 Invariant transformations, Nullifying transformations로 나눠서 설명하는 것을 볼 수 있는데, 이런 식으로 검증 유형을 나눠도 설명을 추가해도 좋을 것 같습니다.

공식 코드북

jhkimon · 2025-10-12T04:05:15Z

book/scm/backdoor_criterion.ipynb

+   "metadata": {},
+   "source": [
+    "### 2. Add Unobserved Common Cause\n",
+    "데이터에 관찰되지 않은 교란요인이 존재한다고 가정했을 때, 추정값이 얼마나 변하는가?\n",


이 부분은 현실에서 검증이 가장 힘든 부분이기도 하고, 실제로 공식문서에서도 '도메인 지식'에 의존하여 검증이 필요하다고 적혀있습니다. 이에 맞게 주의사항 혹은 해당 모델을 사용한 이유에 대한 명시가 필요할 것 같습니다.

"Importance of domain knowledge: This test requires domain knowledge to set plausible input values of the effect of unobserved confounding. We first show the result for a single value of confounder's effect on treatment and outcome."

DoWhy 코드 북

jhkimon · 2025-10-12T04:08:36Z

book/scm/backdoor_criterion.ipynb

+   "id": "de707838",
+   "metadata": {},
+   "source": [
+    "# Backdoor Criterion"


OverView와 비슷한 이유로, "Backdoor Criterion" 이라는 내용보다는 DoWhy | An end-to-end library for causal inference 같은 제목이 더 적합할 것으로 보입니다. 코드 내용은 인과추론 전체 과정을 담고 있으나, Backdoor Criterion는 일부 과정만을 부르는 말인 것으로 보입니다.

jhkimon · 2025-10-12T04:13:06Z

book/scm/backdoor_criterion.ipynb

+     ]
+    }
+   ],
+   "source": [


이와 별개로 DoWhy에서 만든 GCM API도 Counterfactual 추론 (What if? 에 대한 직접적인 답 제공), 그리고 화살표의 강도를 추론하는 등의 강력한 기능을 가지고 있는 것으로 보입니다. 다음 주제로 해당 내용을 진행해도 재밌을 것 같습니다.

공식 문서

참고할 수 있는 코드/책 - Causal-Inference-and-Discovery-in-Python

Funbucket · 2025-10-12T09:46:49Z

꼼꼼하게 리뷰해주셔서 감사합니다 :)
덕분에 놓쳤던 부분을 많이 발견했습니다. 피드백 반영하겠습니다!

Funbucket and others added 9 commits September 10, 2025 01:09

Merge pull request #1 from CausalInferenceLab/main

b4b517c

modify readme

Merge branch 'CausalInferenceLab:main' into main

5c533ed

Merge branch 'CausalInferenceLab:main' into main

3c46aa8

Merge commit '4f52b22a88142ffdcda11b80a645eb36416454ba'

3fd49b6

Merge commit 'f78f645501af34a2b8a6a76b964a6b0d1b12f95e'

c3d4573

update

d8c5791

update toc.yml

9edda84

delete dag.png

f6e67c4

update scm

121837c

Funbucket self-assigned this Oct 3, 2025

Funbucket linked an issue Oct 3, 2025 that may be closed by this pull request

SCM: Graph-Based Approach 페이지 추가를 희망합니다. #24

Open

update backdoor criterion

0458ef7

jhkimon reviewed Oct 12, 2025

View reviewed changes

NHEFS 데이터를 활용한 Graph-based Approach 코드 추가 #30

Are you sure you want to change the base?

NHEFS 데이터를 활용한 Graph-based Approach 코드 추가 #30

Uh oh!

Conversation

Funbucket commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhkimon left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Funbucket commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Funbucket commented Oct 3, 2025 •

edited

Loading