Codex CLI Refactoring Guide: Safe Workflow & Real-World Techniques

Q: Will Codex refactoring change my function behavior?

No — as long as you explicitly state 'preserve behavior' in your prompt. Codex keeps function inputs and outputs identical, only restructuring the internals. To be absolutely sure, have Codex generate tests before refactoring, then verify they all pass after.

Q: How do I ensure refactoring doesn't introduce bugs?

Use a three-layer safety net: (1) preview changes with --dry-run before applying; (2) work on a git branch so you can roll back at any time; (3) generate test coverage before refactoring and run tests after to confirm behavior is unchanged. Setting this up in AGENTS.md makes Codex do it automatically on every refactoring task.

Q: Is refactoring a large project (100K+ lines) feasible?

Absolutely, but you must use an incremental strategy. Refactor module by module — 200–500 lines at a time — starting from leaf nodes (utility functions with the fewest dependencies) and working inward. Track progress in AGENTS.md to maintain cross-session context.

Q: Which files does Codex read during refactoring?

Codex automatically reads: files explicitly named in your prompt, direct dependencies discovered via static analysis (import chains), and related test files if they exist. Use --context to force-include specific files, and .codexignore to exclude irrelevant directories. In suggest mode, the full list of files to be modified is shown before any changes are applied.

Code refactoring is one of the highest-risk operations in software engineering — modifying working code to improve its internal structure without changing external behavior. Codex CLI reduces refactoring risk through sandbox isolation, step-by-step review, and automated testing. This guide covers everything from single-function extraction to 100K-line TypeScript migrations.

1. Why Use AI for Code Refactoring?

The core risk in manual refactoring is that humans struggle to modify code while keeping all behavior identical — especially for cross-file renames, large function splits, and similar operations where omissions are easy. Codex CLI reduces this risk through several mechanisms:

Sandbox isolation: Every operation runs in an isolated sandbox with no direct impact on your codebase until you explicitly confirm
Global consistency: Cross-file renames scan every reference point across the codebase, leaving nothing behind
Test-driven verification: A single task can generate tests, refactor, then run tests — a complete closed loop
Reviewable diffs: In suggest mode, every change is presented as a diff for human confirmation before being applied

Dimension	Manual Refactoring	Codex-Assisted
Cross-file renames	IDE global replace — misses references in string literals and comments	Scans all files including strings and comments; handles uniformly
Large function splits	Manually identify responsibility boundaries; easy to miss edge cases	Auto-identifies responsibilities, generates extraction plan, preserves call signatures
TypeScript migration	File-by-file annotation; repetitive and slow	Bulk type inference, auto-annotation, flags positions it cannot infer
Regression risk	Relies on manual testing or existing test coverage	Can generate missing tests in the same task, then verify immediately
Reversibility	Requires manually creating git stash or branch	Recommends git branches; Codex can generate the full branching workflow

Best use cases: Codex CLI excels at structural refactoring with clear goals — extract function, rename, split file, type migration. It is less effective for vague goals like "make the code better." The more specific your prompt, the more reliable the refactoring output.

2. The 5-Step Safe Refactoring Workflow

Regardless of refactoring scope, following this five-step workflow minimizes risk while maintaining maximum efficiency.

Step 1: Use dry-run to Assess Scope

Before any actual modifications, use the --dry-run flag to understand what Codex plans to do:

# Preview the refactoring plan without making any changes
codex exec --dry-run "Refactor the UserService class:
- Split createUser, updateUser, deleteUser into separate modules
- Extract common parameter validation logic into a validateUserInput function
- Keep all public method signatures unchanged"

# See which files will be touched
codex exec --dry-run "Extract all functions over 50 lines in src/utils.ts into separate files"

The dry-run output tells you: which files will be modified, which will be created, which will be deleted, and the estimated line count of changes. This lets you judge whether the refactoring plan is sound before committing to it.

Step 2: Create a git Branch to Protect Existing Code

# Create a dedicated refactoring branch
git checkout -b refactor/extract-user-service

# Or let Codex create the branch and run the refactoring
codex exec "Create a git branch refactor/user-service, then refactor src/services/user.ts:
split the processUserData function (80+ lines) into three separate functions,
each handling data validation, data transformation, and data persistence"

Step 3: Review Incrementally in Suggest Mode

# Suggest mode (default) — shows diff and waits for confirmation
codex "Extract the duplicate error-handling code in src/api/users.ts
into a handleApiError function, place it in src/utils/errors.ts,
and update all call sites with the correct import"

# Review the diff, then choose to apply or skip each change

Don't skip the review: Even when you trust Codex's output, reading through the diff is the best time to catch potential issues. Pay particular attention to: changes in function signatures, modified import paths, and whether deleted code is genuinely no longer needed.

Step 4: Run Tests to Verify Behavior Is Unchanged

# After applying Codex's changes, immediately run tests
npm test

# Or instruct Codex to run tests automatically after refactoring
codex exec "Refactor UserService, then run npm test after completing.
If any tests fail, list the failure reasons and fix them"

Step 5: Code Review Then Merge

# Review the full branch diff
git diff main..refactor/extract-user-service

# Create a PR (if using GitHub)
gh pr create --title "refactor: extract UserService into modules" \
  --body "Codex-assisted refactoring. All tests pass. No behavior changes."

# Merge after approval
git merge refactor/extract-user-service

3. Three Core Refactoring Patterns

Pattern 1: Extract Function

Extracting a segment of logic from a large function into its own named function is the most common refactoring operation. Here is a typical extraction workflow:

# Identify: ask Codex to find extractable logic blocks
codex "Analyze the processPayment function in src/checkout/payment.ts (~120 lines),
identify blocks of logic that can be extracted into standalone functions,
and generate an extraction plan (don't execute — just list recommendations)"

# Execute: extract one block at a time
codex "Extract the card validation logic from processPayment into
validateCardNumber(cardNumber: string): boolean,
keep processPayment calling validateCardNumber,
do not change the function's external behavior"

codex "Extract the amount calculation logic from processPayment into
calculateFinalAmount(items, discounts, tax): number,
including discount application and tax rate computation,
keep processPayment calling the new function"

Extraction tip: Extract one logical block per task, then run tests immediately to confirm it passes before extracting the next. Avoid trying to extract all logic in a single task — incremental extraction is safer and makes problems much easier to isolate.

Pattern 2: Rename Refactoring (Enforce Naming Conventions)

When a codebase has accumulated inconsistent naming styles, cross-file renaming is the operation most likely to introduce bugs — and also where Codex has the greatest advantage:

# Enforce camelCase variable naming
codex "Scan all TypeScript files under src/, convert all snake_case
variable names and function names to camelCase.
Do NOT modify: object keys (may come from APIs), file names, class names"

# Rename a specific function across all files
codex "Rename the getUserData function to fetchUserProfile everywhere in src/
including test files, and update related comments and JSDoc strings"

# Enforce I-prefix on all interfaces
codex "Add an I prefix to all interface names in src/types/ that don't already have one
(e.g. User becomes IUser), then update every file that references these interfaces"

Pattern 3: Split Large Files

When a single file exceeds 1,000 lines, maintenance cost rises sharply. Here is the standard workflow for splitting a large file into focused modules:

# Step 1: analyze responsibilities
codex "Analyze src/api.ts (~1200 lines), identify how many distinct
responsibility domains it contains, and suggest how to split it into
multiple files by responsibility — give a proposed file structure (don't execute)"

# Step 2: extract one responsibility at a time
codex "Extract all user-related API functions (createUser, getUser, updateUser, deleteUser)
from src/api.ts into src/api/users.ts,
keep re-exports in src/api.ts for backward compatibility"

codex "Extract all product-related API functions from src/api.ts into src/api/products.ts,
keeping the same re-exports in src/api.ts"

# Step 3: clean up the entry file
codex "src/api.ts now contains only re-exports. Convert it into a proper barrel file.
Update any call sites that import specific functions directly from src/api.ts
to import from the relevant submodule instead"

4. TypeScript Migration Refactoring

Migrating a JavaScript project to TypeScript is a special kind of refactoring: runtime behavior stays identical while static type information is added. Codex CLI can dramatically accelerate this process.

Adding Type Annotations

# Batch-add basic type annotations
codex "Convert src/utils/string.js to TypeScript:
- Rename the file to string.ts
- Add type annotations to all function parameters and return values
- Use precise types (no any)
- Do not change any business logic"

# Generate a .d.ts declaration file for a JS file (gradual migration)
codex "Generate a TypeScript declaration file payment.d.ts for src/legacy/payment.js,
inferring types by analyzing actual usage patterns in the codebase"

Eliminating any Types

# Find and fix all any types
codex "Scan all TypeScript files under src/ for uses of any.
Where a concrete type can be inferred, replace with the precise type.
Where the type genuinely cannot be determined, replace with unknown
and add a TODO comment explaining why"

# Fix any in a specific file
codex "Fix all any types in src/services/api.ts:
- API response types should reference existing interfaces in src/types/api.ts
- Event handler parameters should use correct DOM event types
- Third-party callback parameters should reference the library's type definitions"

Extracting Interfaces

# Extract an interface from an implementation
codex "Analyze src/services/UserService.ts.
Extract an IUserService interface (public methods only)
into src/interfaces/IUserService.ts.
Make UserService implement IUserService.
Update code that depends on UserService to depend on IUserService instead
(program to the interface, not the implementation)"

# Infer interfaces from JSON data
codex "Based on the data structure in src/mock/user-response.json,
generate corresponding TypeScript interfaces in src/types/user.ts.
Mark fields that may be null or undefined with ? (optional properties)"

Gradual TS migration advice: Don't switch the entire project to strict TypeScript all at once. Recommended path: (1) rename .js to .ts, allowing implicit any; (2) add types module by module; (3) finally enable strict mode and fix remaining issues. Record the current migration phase in AGENTS.md so Codex applies consistent standards across sessions.

5. Test-Driven Refactoring (TDD Refactoring)

The core principle of TDD refactoring: have test coverage first, then change the code. Tests are your safety net — with tests in place, refactoring becomes a well-protected operation.

Step 1: Have Codex Generate Tests Covering Existing Behavior

# Generate comprehensive tests for the code you're about to refactor
codex "Generate complete unit tests for the calculateOrderTotal function
in src/services/order.ts:
- Cover normal cases (with discount / no discount / multiple items)
- Cover edge cases (empty cart / zero amount / very large amounts)
- Cover error cases (invalid item ID / negative quantity)
Place the test file in src/services/__tests__/order.test.ts using Jest"

# Confirm all tests pass (validate the tests themselves are correct)
npm test src/services/__tests__/order.test.ts

Step 2: Execute the Refactoring

# Once tests pass, run the refactoring
codex "Refactor the calculateOrderTotal function in src/services/order.ts:
- Extract discount calculation logic into applyDiscounts(items, discounts)
- Extract tax calculation logic into applyTax(subtotal, taxRate)
- The main function should only orchestrate — no calculation logic inline
- Keep the function signature unchanged: calculateOrderTotal(items, options): number"

Step 3: Verify Tests Still Pass

# Run tests after refactoring to confirm behavior is unchanged
npm test src/services/__tests__/order.test.ts

# If there are failures, ask Codex to diagnose
codex "Tests are failing. The error output is: [paste error output here]
Analyze the behavioral difference between the refactored calculateOrderTotal
and the original. Fix the implementation to make tests pass.
Do not modify the tests themselves."

# Run the full test suite to ensure nothing else was affected
npm test

Key principle of TDD refactoring: Test files are the constraint — they are not the subject of the refactoring. If tests fail after a refactor, fix the implementation, not the tests — unless you discover the test itself is wrong (in which case confirm whether the original behavior matches the test description before changing it).

6. AGENTS.md Refactoring Configuration

For projects with ongoing refactoring needs, configuring refactoring standards in AGENTS.md lets Codex automatically follow project conventions in every session, eliminating the need to re-explain rules each time.

## Refactoring Standards
- Run `npm test` after every refactoring task to verify behavior is unchanged
- Public APIs must not be removed — only additions are allowed (backward compatibility)
- Follow project naming conventions: camelCase variables, PascalCase classes,
  UPPER_SNAKE_CASE constants
- Read CHANGELOG.md before refactoring to understand historical decisions
  and avoid repeating abandoned approaches
- Extracted utility functions go in src/utils/; service classes go in src/services/
- When creating a new file, also create its test file under __tests__/
- Do not introduce new external dependencies during refactoring

## Refactoring Progress Tracking
- Completed modules are listed in the "Refactoring Progress" section at the end of this file
- In-progress modules are marked 🔄, pending are marked ⬜, completed are marked ✅

Save the above to your project's AGENTS.md (or append to the appropriate section if the file already exists). These standards will automatically apply to every subsequent Codex refactoring task.

AGENTS.md placement: A root-level AGENTS.md applies to the entire project. If a subdirectory needs different refactoring rules (e.g., packages/legacy/ requires more conservative strategies), create a separate AGENTS.md in that directory. Codex automatically merges both configurations, with the deeper path taking precedence.

7. Splitting Functions Over 50 Lines

Function length alone is not the problem — but beyond a certain size, functions tend to take on multiple responsibilities, and that is what actually needs refactoring. Here is a complete guide to splitting strategy by function size.

Step 1: Identify Responsibilities

# Ask Codex to analyze what a function is doing
codex "Analyze the handleCheckout function in src/controllers/checkout.ts (~180 lines).
List every responsibility it carries (describe each in one sentence).
Identify which responsibilities are tightly coupled and which can be extracted independently."

Step 2: Generate an Extraction Plan (dry-run)

# Based on the responsibility analysis, generate a concrete extraction plan
codex exec --dry-run "Based on the responsibility analysis, generate a function extraction plan
for handleCheckout:
- Each extracted function should have a clear name (verb + noun)
- Specify parameter and return types
- State the extraction order (extract leaf dependencies first)
Do not execute — output the plan only"

Step 3: Extract One Responsibility at a Time

# Extract one responsibility per task, verify after each
codex "Extract the inventory check logic from handleCheckout into
checkInventoryAvailability(items): Promise<void>.
The function should throw InsufficientStockError if stock is insufficient.
Replace the inline logic in handleCheckout with a call to the new function."

npm test  # Verify after every extraction

codex "Extract the payment processing logic from handleCheckout into
processPaymentTransaction(paymentInfo, amount): Promise<string>,
returning a transaction ID. Replace inline logic in handleCheckout."

npm test

Function Size	Recommended Strategy	Extraction Granularity
50–100 lines	Identify 2–3 responsibilities and extract directly	One responsibility per task
100–300 lines	Map responsibilities first, then extract from innermost to outermost	3–5 tasks, one responsibility each
300+ lines	Consider splitting into separate classes or modules, not just functions	Extract to private methods first, then evaluate class splitting

Beware of over-extraction: Not every large function needs splitting. A 200-line function with a single, complex responsibility (like protocol parsing or a mathematical computation) may become less readable if forced apart. The test: can every extracted function be described in one clear sentence?

8. FAQ

Will Codex refactoring change my function behavior?

No — as long as you explicitly state "preserve behavior" or "keep the function signature unchanged" in your prompt. Codex keeps function inputs and outputs identical, restructuring only the internals. That said, language models are not perfect. For absolute certainty, run your test suite after every refactoring task. Including "run npm test after refactoring and confirm all tests pass" in your prompt is the gold-standard practice.

How do I ensure refactoring doesn't introduce bugs?

Use a three-layer safety net:

Preview phase: Use --dry-run to review the scope of changes before applying anything
Isolation phase: Work on a git branch so you can git checkout . to revert instantly
Verification phase: Generate test coverage before refactoring, then run tests after to confirm behavior is unchanged

Writing "run npm test after every refactoring task" into AGENTS.md makes Codex do this automatically, with no manual reminder needed.

Is refactoring a large project (100K+ lines) feasible?

Completely feasible, but an incremental strategy is mandatory — never attempt to refactor the entire codebase in a single task. The recommended approach: refactor module by module in 200–500 line increments; start from leaf nodes (utility functions with the fewest dependencies); work inward toward core modules; track progress in AGENTS.md to maintain cross-session context. For very large codebases, using GPT-4.1's 1M token context (codex --model gpt-4.1) improves handling of modules with complex dependency graphs.

Which files does Codex read during refactoring?

Codex automatically reads: (1) files explicitly named in your prompt; (2) direct dependencies found via static analysis (import chains); (3) related test files if they exist. Use the --context flag to force-include specific files such as architecture docs or type definitions. Use .codexignore to exclude directories that don't need to be read (like node_modules). In suggest mode, the complete list of files planned for modification is displayed before any changes are applied, giving you a clear picture of what was read.

More multi-file operation techniques: Refactoring frequently involves cross-file operations. See the Multi-File & Large Codebase Guide for complete strategies on context window management, .codexignore configuration, and incremental refactoring.